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ABSTRACT 

Fundamental  issues  of  multiple  accessing  are  identified. 
These  issues  include  transmitter  asynchronism,  variability  of  the 
set  of  active  users,  feedback,  and  degree  of  codebook  knowledge 
among  the  users.  Various  multiple  access  schemes  are  examined 
under  these  issues.  These  issues  are  subsequently  modeled  and 
analysed  using  an  information  theoretic  framework. 

We  discover  that  the  capacity  region  of  the  asynchonous 
multiple  access  channel  is  different  from  that  of  the  synchronous 
channel . 


For  communication  systems  with  users  having  random  message 
generation  time,  we  demonstrate  that  its  channel  capacity 
resembles  that  of  an  asynchronous  channel  even  though  the  users 
are  synchronous,  if  decoding  delay  is  constrained  to  be  much 
smaller  than  the  message  inter-arrival  time. 
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We  investigate  communication  with  restricted  decoder 
structure.  New  information  theoretic  quantities  that  incorporate 
the  decoding  metric  used  are  discovered  and  examined.  Using  these 
quantities,  we  provide  a  rigorous  and  novel  treatment  for  the 
theory  of  jamming.  These  mathematical  techniques  provide  insight 
for  achieving  reliable  communication  in  a  multiple  access 
environment  where  each  user  may  not  know  the  codebook  of  the 
other  users  and  a  jammer  may  be  present. 

We  then  apply  the  general  theory  developed  to  three 
specific  asynchronous  channel  without  feedback,  namely  the  OR 
channel,  the  spread  spectrum  channel  and  the  collision  channel. 
Practical  and  novel  coding  schemes  are  suggested.  Maximum 
throughput,  error  rate  and  decoding  complexity  are  analysed. 
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Chapter  1  Issues  of  Multiple  Accessing 


1.1  Multiple  accessing  schemes 


Multiple  accessing  is  the  technique  of  communication  by  a 
group  of  users  over  a  shared  channel.  An  information  theoretic 
model  of  the  multiple  access  channel  was  given  by  Shannon  [1]. 
Such  a  model,  however,  has  failed  to  yield  useful  results  for  the 
various  multiple  access  schemes  used  in  practice.  Such  schemes 
include  Time  Division  Multiple  Accessing  (TDMA) ,  the  Aloha  scheme 
[2],  the  Capetanakis  tree  algorithm  [20]  (modified  by  Gallager  [3]) 
and  Code  Division  Multiple  Accessing  (CDMA) [4,5,6] .  This  variety 
can  be  attributed  to  different  requirements  and  resources  of  the 
communication  system.  The  purpose  of  this  thesis  is  to  clarify 
the  fundamental  issues  of  multiple  accessing  so  that  a  more 
comprehensive  information  theoretic  model  can  be  developed. 
Coding  theorems  are  stated  and  proved  for  these  issues.  Practical 
and  novel  multiple  access  schemes  for  specific  channels  are 
suggested  and  analysed. 

It  would  be  illuminating  to  look  at  specific  multiple 
access  schemes  before  we  abstract  the  issues  involved  in  multiple 
accessing.  First  of  all,  we  shall  introduce  some  notations.  Let  M 
be  the  number  of  users  in  the  system.  Each  user  transmits 
symbols  from  the  alphabet  2.  The  channel  output  alphabet  is  Y.  We 
assume  that  the  symbols  transmitted  by  the  M  users  are 
synchronous  (a  condition  to  be  relaxed  later  for  individual 
channels).  The  channel  output  y,  given  that  the  M  users  transmit 


the  symbols  x,  ,  x2,...,xrt  respectively,  occurs  according  to  the 
probabilities  P(y/x  (  x1...xM).  The  throughput  of  a  scheme  is  the 
sum  of  the  information  rates  for  all  M  users,  normalized  by  the 
maximum  throughput  for  the  case  when  the  channel  is  used  by  a 
single  user.  Using  these  notations,  the  four  schemes  mentioned  in 
the  previous  paragraph  and  the  classical  multiple  access  channel 
of  Shannon  can  be  characterized  as  follows 

Scheme  1  Time  division  multiple  accessing 

Channel:  The  collision  channel  defined  as  follows. 

X  *  (2n  packets  each  of  length  n)  U  {idle! 

Y  =  {2W  packets  each  of  length  n)  U  (idle)  U  (collision) 

P(y/x,  . .  .xM  ) :  y  *  xz  if  for  some  i,  xt  =  packet  and  Xj  =  idle 
for  all  j  4  i;  y  ■  idle  if  all  x^  *  idle;  y  «  collision  if  more 
than  one  x2  '  s  are  packets. 

M:  fixed 

Access  scheme:  Time  is  divided  into  frames.  Each  frame  consists 
of  M  slots.  Each  user  transmits  its  packet  in  its  assigned  slot. 
The  user  transmits  idle  if  it  does  not  transmit  a  packet. 

Timing:  Slot  synchronization  and  frame  synchronization  required. 

Throughput :  1 

Scheme  2  The  slotted  Aloha  scheme  [2] 


Channel:  The  collision  channel  defined  previously. 

M:  Large,  each  user  generates  packets  with  Poisson  statistics. 

Access  scheme:  Transmit  a  packet  once  generated.  If  the 
transmission  results  in  a  collision,  the  user  retransmits  the 
packet  after  a  random  delay.  Repeated  retransmission  until  the 
packet  is  successfully  transmitted. 

Timing:  Slot  synchronization  required  for  slotted  Aloha. 

-I 

Throughput:  e  *  0.368.  This  simple  scheme,  however,  is 

unstable  in  the  sense  that  the  channel  soon  becomes  flooded  with 
retransmission. 

Effect  of  symbol  asynchronism:  When  the  packets  are  not 
synchronized,  we  have  the  pure  Aloha  scheme.  A  packet  is  totally 
erased  if  it  collides,  even  partially,  with  other  packets.  The 
throughput  of  pure  Aloha  is  e  '/2* 

Scheme  3  The  Tree  algorithm  {improved  version  of  Gallager  [3]) 
Channel:  The  collision  channel 

M:  Large,  each  user  generates  packets  with  Poisson  statistics 
with  total  rate  X  for  all  users. 

Access  scheme:  each  packet  is  labeled  with  its  generation  time. 
The  algorithm  tries  to  resolve  conflicts  resulting  from 
transmitting  packets  in  a  time  period  called  an  epoch.  The 
algorithm  works  as  shown  in  figure  1.1.1.  There  are  two  kinds  of 


FIRST  HALF  OF 


GET  NEW  EPOCH:  Epoch  start  from  end  of  last  epoch  with  length  1.08  A 


TRANSMIT  (*j  :  All  packets  with  generation  time  in  the  period  * 

transmit  in  the  next  slot. 

Figure  1.1.1  The  modified  version  of  the  tree  algorithm. 
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states  (circle  and  square)  in  the  flow  chart.  The  circles 

redefine  the  epoch.  The  squares  authorize  the  set  of  users  with 
generation  time  in  a  certain  subinterval  of  the  epoch  (shown  as 

the  argument  in  the  function  Transmit(*))  to  transmit  in  the  next 

slot.  The  channel  output  of  the  transmission  determines  the  next 
state  in  the  flow  chart. 

Timing:  Slot  synchronization  required. 

Throughput:  .487 

Scheme  4  Code  division  multiple  accessing 

Me  shall  consider  code  division  multiple  accessing  for  three 
types  of  channel,  namely  the  OR  channel,  the  spread  spectrum 

channel  and  the  collision  channel.  A  more  detailed  description 
and  analysis  of  these  channels  and  coding  schemes  are  found  in 
chapters  5,  6  and  7  of  this  thesis.  For  this  scheme,  the  number 
of  users  can  be  large,  with  only  a  small  portion  of  the  users 
active  at  a  time.  Those  users  that  are  not  active  ’transmit'  the 
idle  symbol,  which  we  assume  to  be  in  X.  The  message  of  each 
user  is  redundantly  coded  for  achieving  reliable  communication  in 
the  presence  of  interference  due  to  the  signals  of  the  other 
users.  We  say  that  joint  decoding  is  performed  if  the  codewords 
sent  by  the  other  users  are  also  estimated  for  the  sake  of 
obtaining  a  better  estimate  of  one's  message.  We  say  that  joint 
decoding  is  not  performed  if  the  signals  of  the  other  users  are 
treated  as  memoryless  noise  in  the  estimation  of  one's  message. 
No  feedback  is  required  for  these  schemes.  Symbol  synchronization 


(as  shall  be  examined  in  each  case)  and  frame  (a  frame  is  the 
length  of  a  codeword)  synchronization  are  not  required.  The  three 
channels  are  described  as  follows 

A.  The  OR  channel  [4] 

Channel : 

X  -  {0,1)  ,  idle  -  0 
*  -  (0,1] 

P<y/x( . ..xH) :  y  «  0  if  all  xi  *  0;  1  otherwise. 

Access  scheme:  One  scheme  uses  pulse  position  modulation  and 
convolutional  encoding.  A  pulse  position  modulation  symbol  is  a 
sequence  with  a  one  (called  a  pulse)  among  n-1  zeros,  and  hence 
there  are  n  pulse  positions.  The  binary  (0  or  1)  message  stream 
is  fed  into  a  convolutional  encoder  shown  in  figure  1.1.2  which 
puts  out  pulse  position  modulation  symbols.  The  pulses  sent  by 
the  other  users  become  interference  to  a  receiver. 

Throughput:  .69  if  joint  decoding  is  not  performed;  1  if  joint 
decoding  is  performed. 

The  effect  of  symbol  asynchronism:  In  continuous  time,  a  pulse 
has  a  duration  of  T  ,  thus  symbol  asynchronism  may  occur.  By 
limiting  the  temporal  resolution  of  the  pulse  positions  and  using 
a  decision  rule  which  obtains  a  discrete  estimate  of  the  pulse 
position,  we  may  model  the  channel  in  a  symbol  synchronous 


manner . 


B. 


The  spread  spectrum  channel  (or  the  adder  channel)  [5] 


Channel: 

Z  »  {-1,+1}  0  { 0 J  ;  the  idle  symbol  0  may  not  be  used  during  the 
transmission  of  a  message. 

Y  *  The  set  of  integers 

M 

P(y/x. . . .xM) :  y  «  Z  x  .  . 

In  ^ 


Access  scheme:  Each  user  is  assigned  an  n  bit  long  Pseudo-Noise 

n 

(PN)  sequence  {c^  Ij.i  •  cA*j  €  (±11*  The  binary  {0,1}  message 

stream  {...u.  ,  u-  ,  u;  .  u .  .  u*  . ..}  is  encoded  as  the  binary 
{0,1}  data  stream  { . .  .x^_2  x^,  x^  xix  xi#2  ...}.  Define  l{c;  A 

-n  y,  * 

-  '  and  0{c^}  *  •  The  code  stream  that  is 

sent  by  the  transmitter  comprises  the  sequences  x  i  *  {ct- ■ } 
transmitted  successively  for  the  consecutive  k’s.  The  coding 
scheme,  realized  using  continuous  waveform,  is  shown  in  figure 
1.1.3. 


Throughput:  .721  if  joint  decoding  is  not  performed. 

The  effect  of  symbol  asynchronism:  In  continuous  time,  a  symbol 
is  realized  as  a  chip  of  duration  nr  as  shown  in  figure  1.1.3. 
Symbol  asynchronism  occurs  when  the  chips  are  not  synchronized 
for  the  users.  It  is  shown  in  chapter  6  that  symbol  asynchronism 
does  not  affect  throughput  for  a  specific  structure  of  the 
demodulator-decoder . 
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C.  The  collision  channel  [6] 

Channel:  the  collision  channel 

Access  scheme:  Each  user  sends  packets  infrequently  and 
successive  packets  are  encoded.  This  is  achieved  by  having  the 
message  redundantly  coded,  then  interleaved  and  broken  up  into 
packets.  Collision  results  in  erasure  of  the  entire  packet,  but 
coding  enables  the  reconstruction  of  the  original  message. 

Throughput:  e  *  «  .368 

The  effect  of  symbol  asynchronism:  When  the  packets  are  not 

synchronized,  partial  overlapping  of  packets  results  in  total 

loss  of  the  packets.  Using  a  technique  called  super-packet ing  to 

-1 

be  described  in  chapter  7,  the  e.  •  throughput  is  not  affected  by 
symbol  asynchronism. 

The  classical  multiple  access  channel  [1] 

Channel:  General 
M:  fixed 

Access  scheme:  Joint  decoding  assumed.  Coding  theorem  proved  only 
for  the  case  without  feedback.  It  has  been  shown  that  feedback 
may  enlarge  the  capacity  region,  (example:  The  two-user  adder 
channel  of  Wolf  [1]  with  X  =  {+1,-1}  and  y  =  x(  +  xz  .) 

Timing:  Symbol  synchronization  and  frame  synchronization 
required.  Recent  work  by  Cover  et.  al.  [7]  shows  that  channel 


capacity  is  not  affected  if  the  code  frames  of  the  users  are 
shifted  slightly  with  respect  to  each  other. 


Throughput:  The  capacity  region  for  the  two-user  synchronous 
multiple  access  channel  is 

iR  ■  convex  hull  D  # 

P,(Z,  )fPa(X2) 

rPtx.XjyJ-PU,  )P1(xt)P(y/xl  xa) 

where  (R,,Ra)  iff 

R  ,  <  I  (X  , ; Y/Xa) 

R  j  <  I(Xa;Y/X() 

R  ,  +  R2  <  I(X,X3;Y) 

The  above  result  has  been  generalised  for  the  case  of  M>2  and  the 
case  of  correlated  sources  [1]. 

Several  questions  arise  if  we  compare  these  five  schemes 
closely. 

1.  The  classical  theory  of  multiple  accessing  gives  a 
throughput  of  1  for  the  collision  channel,  instead  of 
.368.  While  the  throughput  of  TDMA  agrees  with  the 
classical  theory,  the  throughput  of  the  Aloha  scheme  and 
the  tree  algorithm  are,  both  less  than  .5.  What  accounts 
for  the  difference  in  throughput? 
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2.  What  accounts  for  the  higher  throughput  of  the  tree 
algorithm  compared  with  the  Aloha  algorithm? 

3.  The  use  of  code  division  multiple  accessing  for  the 
collision  channel  achieves  the  same  throughput  as  the 
Aloha  scheme,  yet  without  feedback.  Can  the  Aloha  channel 
(that  is,  the  collision  channel  with  feedback  about  the 
occurrence  of  collision)  achieve  a  higher  throughput?  If 
not,  why  is  it  that  the  feedback  of  the  Aloha  channel 
cannot  increase  throughput  beyond  that  of  the  code 
division  multiple  access  scheme? 

4.  Why  does  feedback  enlarge  the  capacity  region  of  the 
two-user  adder  channel  of  Wolf? 

5.  How  does  joint  decoding  improve  the  throughput  of  the  OR 
channel? 

We  shall  next  abstract  the  fundamental  issues  of  multiple 
accessing  and  answer  the  above  questions  for  the  purpose  of 
illustrating  these  issues. 


1.2 


The  issue  of  synchronization 


Two  forms  of  synchronism  appear  in  the  examples  in  the 
previous  section,  namely  symbol  synchronism  and  frame 
synchronism.  With  symbol  asynchronism,  the  characterization  of 
the  channel  output  is  quite  cumbersome  and  has  to  be  described 
individually  for  each  channel.  The  analysis  in  this  thesis  shall 
assume  symbol  synchronism  except  in  chapters  5,  6  and  7  when 
symbol  asynchronism  is  treated  individually  for  the  OR  channel, 
the  spread  spectrum  channel  and  the  collision  channel. 

The  more  important  issue  is  frame  synchronism.  Frame 
synchronism  is  the  degree  of  agreement  of  a  common  time  frame 
among  the  users.  Each  user  is  assumed  to  have  a  clock  with  the 
same  frequency,  .  but  may  have  different  initializations. 
Asynchronism  results  from  the  uncertainty  about  the 
initializations  of  the  clocks  of  other  users.  It  is  worth 
emphasizing  that  we  are  dealing  with  asynchronism  among  the 
transmitters,  not  between  the  transmitter-receiver  pairs. 
Synchronization  between  the  transmitter  and  the  receiver  can  be 
achieved  by  the  transmission  of  preambles  for  synchronizing  the 
receiver,  or  by  simply  synchronizing  the  clocks  at  the 
transmitter  and  the  receiver  beforehand. 

Why  is  a  common  time  frame  useful?  Many  multiple  access 
schemes  try  to  reduce  conflict  or  interference  among  the  users  so 
as  to  improve  channel  throughput.  This  sometimes  requires  a 
common  time  frame.  For  example,  TDMA  requires  perfect  frame 
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synchronization  so  that  the  system  would  be  collision  free.  A 
common  time  frame  also  allows  the  use  of  time  sharing.  Suppose 
we  have  two  multiple  access  schemes  that  achieve  two  sets  of 
transmission  rates  for  the  users.  By  using  the  first  scheme  a 
fraction  of  the  time  and  the  second  scheme  the  remaining  fraction 
of  the  time,  the  convex  combination  of  the  two  sets  of  rates  can 
be  achieved.  This  explains  the  convex  hull  required  for 
characterizing  the  capacity  region  for  the  classical  synchronous 
multiple  access  channel.  Consider  the  example  of  the  collision 
channel.  The  capacity  regions  of  figure  1.2.1a  and  b  are 
obviously  achievable  by  allowing  a  single  user  to  transmit  all 
the  time.  Figure  1.2.1c  gives  the  convex  hull  of  figure  1.2.1a 
and  b.  It  happens  that  this  convex  hull  includes  the  entire 
capacity  region.  Consequently,  the  rate  of  one-half  is  achievable 
for  the  two  users  at  the  same  time. 

Synchronization  is  a  matter  of  degree.  A  precise 
definition  of  the  degree  of  asynchronism  will  be  given  in  chapter 
2  while  the  intuitive  concept  is  introduced  here.  Perfect 
asynchronism  happens  when  each  transmitter  knows  absolutely 
nothing  about  the  initialization  of  the  clocks  of  the  other 
transmitters.  At  the  other  extreme,  we  have  ideal  TDMA  for  which 
a  common  time  frame  makes  perfect  elimination  of  collision 
possible.  In  between  these  two  extremes,  the  clocks  may  differ  by 
some  finite  and  unknown  differences.  For  the  TDMA  scheme,  we  may 
use  guard  time  to  avoid  collision.  The  system  will  be  collision 
free  if  the  guard  time  is  larger  than  the  maximum  possible  clock 
difference.  Consequently,  a  capacity  of  one  can  be  achieved  by 


Figure  1.2.1a 
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using  round-robin  TDMA  with  sufficiently  long  slots  for  each  user 
so  that  the  guard  time  is  negligible  in  proportion.  The  cost, 
however,  is  that  the  messages  suffer  a  delay  (roughly  the  length 
of  a  frame  for  the  case  of  TDMA)  that  is  much  longer  than  the 
degree  of  asynchronism  (the  minimum  guard  time  required  to  avoid 
conflict  for  TDMA).  In  general,  the  capacity  region  for  the 
multiple  access  channel  with  mild  asynchronism  is  the  same  as 
that  with  perfect  synchronism.  This  fact  is  proved  by  Cover  et. 
al.  [7],  Their  proof  utilizes  time  sharing  of  two  codes  for  the 
ordinary  multiple  access  channel  and  uses  maximum  likelihood 
decoding  over  shifts  of  the  hypothesized  transmitter  codewords  as 
shown  in  figure  1.2.2.  Instead  of  having  a  guard  time,  .the  two 
coding  frames  are  allowed  to  overlap,  and  the  overlapped  region 
is  ignored  in  the  decoding.  In  the  proof,  the  shift  d  (or 
equivalently,  the  length  of  the  overlapped  region)  is  assumed 
bounded  and  the  length  of  each  frame,  say  n,  is  substantially 
larger  than  d.  Their  argument  then  lets  d  (and  consequently  n)  go 
to  infinity,  with  d/n  goes  to  zero. 

However,  frame  synchronization  is  often  inaccurate  and 
difficult  to  achieve  in  practice.  For  users  with  small  data  rate, 
poorly  synchronized  clocks  may  require  a  guard  time  much  longer 
than  the  duration  of  a  data  burst.  To  achieve  the  synchronous 
capacity  would  require  impractically  long  frames  which  cause 
intolerable  delay.  With  the  trend  of  increasing  bandwidth  per 
link  and  increasing  number  of  users  (often  transmitting  in  small 
bursts),  it  is  often  convenient  to  assume  that  the  channel  is 


perfectly  asynchronous. 


In  essence,  we  want  the  length  of  the 


First  frame 


Second  frame 


codewords  n  to  be  smaller  than  the  shifts  d. 

Chapter  2  proves  that  the  capacity  region  for  the  two-user 
asynchronous  multiple  access  channel  without  feedback  is 

&  *  0  A 

...  :P(x(  x1y)=P(x,  JPjx^Pty/x.x*) 

where  (R(  ,R3)t^  iff 
Rv  <I(X,;Y/X2) 

R;  <  I(Xa;Y/X,) 

R  ,  +  Rj  <  I  (X,  X2 ; Y) 

It  is  noteworthy  that  the  capacity  of  the  synchronous  channel  is 
the  convex  hull  of  the  capacity  region  defined  above.  This  fact 
also  holds  for  more  than  two  users.  The  time  sharing  argument, 
which  achieves  the  convex  hull  region  for  the  asynchronous 
channel,  cannot  be  used  for  the  perfectly  asynchronous  channel. 

Consider  again  the  example  of  the  two-user  collision 
channel,  this  time  with  perfect  asynchronism.  The  capacity  region 
is  given  in  figure  1.2. Id,  which  is  not  convex  and  is  smaller 
than  that  of  the  synchronous  channel  in  figure  1.2.1c.  For  the  M 
user  asynchronous  collision  channel  with  equal  transmission  rates 

for  the  users,  the  total  throughput  can  be  shown  to  be  less  than 

M'1  .  -i 

(1-1/M)  ,  which  approaches  e  for  large  M. 
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The  issue  of  uncertainty  about  the  set  of  simultaneous 


users 


The  paradox  of  e  versus  1  still  remains  unexplained. 
Consider  a  system  with  M  users,  each  generating  packets  at  a 
Poisson  rate  of  A/M  (packets  per  slot),  0<^<1.  Suppose  all  M 
transmitters  are  perfectly  synchronized.  With  sufficient 
buffering,  a  throughput  of  1  can  be  achieved  by  using  round-robin 
TDMA.  No  feedback  is  required.  The  Aloha  scheme  and  the  tree 
algorithm,  both  requiring  feedback,  seem  to  have  a  much  smaller 
throughput.  Lack  of  synchronization  does  not  cause  such 
discrepancy  because  supplying  synchronized  clocks  for  each  user 
of  the  Aloha  channel  or  the  tree  algorithm  does  not  seem  to  help. 
It  may  be  pointed  out  that  M  is  assumed  infinite  for  these  two 
schemes.  However,  an  infinite  M  never  occurs  in  practice. 

The  paradox  disappears  if  the  communication  system  imposes 
a  maximum  allowable  delay  D  (in  terms  of  slots)  for  the  reception 
of  the  packets  from  its  generation  time.  More  precisely,  if  D 
satisfies  the  condition  M/[2(l-  A  )]  >>  D  >>  l/\  ,  then  the 
capacity  of  the  collision  channel  is  e  1  even  though  the  users  are 
synchronized.  We  shall  give  an  intuitive  explanation  of  this 
result  here,  while  chapter  3  will  give  a  rigorous  argument  for 
the  general  channel. 


First,  we  examine  the  average  delay  for  round-robin  TDMA, 
which  can  be  viewed  as  an  M  server  queue  with  exponential 
inter-arrival  time  and  deterministic  service  time.  The  average 
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delay  can  be  shown  to  be  M/2  +  M  /[2(1-A)]  +1,  in  which  the 
first  term  accounts  for  the  random  arrival  time  of  a  packet 
within  a  round-robin  cycle,  the  second  term  accounts  for  the 
queueing  delay,  and  the  third  term  accounts  for  the  transmission 
time  of  the  packet  over  the  channel,  that  is,  a  slot.  Ignoring 
the  transmission  time,  the  average  delay  simplifies  to 
M/[2(l-A)].  For  given  A  (which  may  be  close  to  one  in  order  to 
achieve  a  throughput  close  to  one),  average  delay  is  proportional 
to  M.  For  systems  with  large  M  (say  1000),  D  becomes  intolerably 
large.  We  are  interested  in  the  case  when  the  maximum  tolerable 
delay  D  is  much  smaller  than  M/[2(l-X)]. 

On  the  other  hand,  we  cannot  have  the  maximum  tolerable 
delay  too  small  without  decreasing  the  reliability  of 
communication.  The  number  of  packets  generated  within  D  equals 
Xd+ONdA  ) .  When  D  is  just  a  few  1/A  's,  the  number  of  packets 
generated  within  D  is  small  and  highly  variable,  thus  there  may 
be  more  packets  to  be  transmitted  within  the  allowable  delay  D 
than  the  channel  can  handle. 

Therefore,  the  case  of  interest  satisfies  the  condition 
M/[2(l-\)]  »  D  >>  1/A  .  Since  M/[2(l-A)]  »  D,  most  users  would 
have  at  most  one  packet  to  send  within  D.  Since  D  >>  1/A  ,  the 
set  of  active  users  would  have  size  N  which  is  approximately  dA  , 
by  the  law  of  large  numbers.  Thus,  there  is  an  uncertainty  about 
which  combination  of  N  users  is  active  within  a  set  of  M  users. 

The  obvious  question  then  is  how  to  communicate  reliably 
in  the  absence  of  feedback.  Conflict  free  scheduling  is 
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impossible  in  this  case.  The  solution  is  to  allow  the  users  to 
transmit  asynchronously  (even  though  they  may  have  a  synchronized 
clock!)  in  the  sense  that  they  transmit  as  soon  as  they  have  a 
message.  The  encoded  message  is  spread  over  a  time  interval 
comparable  to  D  to  ensure  reliable  communication. 

Chapter  3  shows  that  random  coding  can  achieve  the 
asynchronous  capacity  for  this  channel  model.  Each  user  would 
have  to  transmit  a  preamble  to  identify  himself  as  the  current 
user  of  the  channel.  For  the  converse,  we  show  that  reliable 
communication  is  impossible  above  certain  asynchronous 
sum-capacity,  which  will  be  defined  in  chapter  3. 


1.4 


The  issue  of  feedback 


The  multiple  access  channel  with  feedback  is  modeled  in 
figure  1.4.1.  For  the  classical  one  way  memoryless  channel,  it 
is  well  known  that  feedback  cannot  improve  channel  capacity.  For 
multiple  accessing,  it  has  been  shown  that  feedback  improves  the 
capacity  region  of  the  two-user  adder  channel  [1],  the  additive 
Gaussian  noise  channel  [1],  as  well  as  increases  the  throughput 
of  the  tree  algorithm  beyond  the  e  *  capacity  for  the  collision 
channel  without  feedback.  It  is  important  to  understand  how 
feedback  affects  the  operation  of  the  multiple  access  channel, 
and  why  the  capacity  region  of  the  multiple  access  channel  may  be 
enlarged  by  feedback. 

There  are  three  plausible  reasons  for  the  enlargement  of 
the  capacity  region  in  the  presence  of  feedback.  First,  the  use 
of  feedback  may  improve  synchronization  among  the  transmitters  so 
that  signaling  interference  among  the  users  can  be  minimized. 
Second,  feedback  may  enable  us  to  achieve  a  probability 
distributions  for  the  channel  output  that  is  not  achievable  by 
independent  probability  distributions  for  the  inputs  of  the 
users.  Third,  feedback  may  provide  information  about  the  set  of 
simultaneous  users,  thus  helping  to  achieve  better  scheduling 
during  retransmission. 

One  example  that  illustrates  the  first  two  reasons  is  the 
two-user  noiseless  adder  channel  of  Wolf,  with  its  capacity 
region  shown  in  figure  1.4.2.  With  full  cooperation  between  the 


1.4.2  The  capacity  region  of  the  two-user  adder  channel 


two  users,  the  capacity  of  logj.  3  bits  per  symbol  can  be  achieved 
by  using  the  letters  in  the  three  level  output  alphabet  {—2,0,2} 
equiprobably .  In  the  absence  of  cooperation  and  feedback,  an 
equiprobable  output  alphabet  cannot  be  achieved.  The  best 
throughput  is  achieved  by  having  each  user  transmitting 
independently,  with  equal  probability  distribution  for  the  input 
alphabet  {-1,1}.  The  resulting  output  distribution  is  then 
P(y=-2)  =  P(y*2)  =  1/4  and  P(y=0)  *  1/2.  Consequently,  the 
entropy  of  Y  is  1.5  bits  per  symbol.  The  channel  throughput 
without  cooperation,  which  equals  the  entropy  of  Y  in  this  case, 
is  less  than  the  throughput  of  loga  3  bits  for  the  case  with  full 
cooperation.  Consider  now  the  feedback  of  y  to  the  two 
transmitters.  By  observing  y,  the  first  transmitter  can  derive 
the  code  symbol  of  the  second  transmitter  simply  by  subtracting 
from  y  the  symbol  the  first  transmitter  sent.  Similarly,  the 
second  transmitter  can  derive  the  code  symbol  sent  by  the  first 
transmitter.  Therefore,  locations  where  there  are  ambiguities  (to 
the  receivers)  about  the  symbols  sent  by  the  transmitters  (that 
is,  where  the  channel  output  is  0)  may  be  resolved  cooperatively 
by  retransmission.  Hence  the  transmitters  can  cooperate  fully  in 
resolving  these  ambiguities  during  retransmission,  thus  achieving 
a  higher  throughput  for  the  channel.  The  two  transmitters  must  be 
synchronized  during  retransmission  to  achieve  full  cooperation. 
The  observation  of  the  channel  output  may  help  to  synchronize  the 
users. 

If  feedback  does  not  help  to  synchronize  the 
retransmission  (or  it  is  ignored  for  the  purpose  of 
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synchronization),  then  the  capacity  region  cannot  be  enlarged  by 
feedback.  For  the  Aloha  scheme,  retransmissions  are  not 
synchronized  among  the  users,  which  explains  why  its  capacity  is 
the  same  as  that  of  the  collision  channel  without  feedback. 

Neither  the  tree  algorithm  nor  the  Aloha  scheme  explicitly 
require  a  synchronized  clock  for  each  user.  For  the  Aloha  scheme, 
feedback  can  be  asynchronous  in  the  sense  that  feedback  about 
occurrences  of  collisions  can  have  randomly  large  delay  without 
affecting  the  throughput.  The  tree  algorithm,  in  contrast, 
requires  synchronized  feedback,  which  helps  to  achieve  a  certain 
degree  of  synchronism  for  the  users  once  a  conflict  arises. 
Therefore,  the  capacity  of  the  tree  algorithm  is  larger  than  that 
of  the  Aloha  scheme. 

Feedback  can  also  provide  information  about  the  set  of 
simultaneous  users.  For  the  Aloha  scheme,  a  collision  informs 
the  transmitter  that  there  is  at  least  one  other  message 
transmitted  simultaneously.  Other  transmitters  need  not  be 
provided  this  information.  For  the  tree  algorithm,  feedback 
distinguishes  the  cases  of  none,  one  or  more  than  one  message  in 
a  slot.  This  information  is  broadcast  to  every  transmitter.  Such 
information  about  the  size  of  the  set  of  simultaneous  users 
improves  channel  capacity. 

Besides  these  three  reasons  why  feedback  may  improve 
throughput,  feedback  very  often  reduces  the  complexity  of 
encoding  for  reliable  communication.  The  Aloha  scheme  and  the 
tree  algorithm  do  not  require  error  correction  coding  for  their 


operation,  whereas  error  correction  coding  is  indispensable  for 
the  collision  channel  without  feedback.  The  doubly  exponential 
error  exponent  of  the  multiple  access  OR  channel  with  feedback 
has  been  established  by  Gyorki  and  Kerekes  [8]. 

It  is  hard  to  obtain  closed  form  characterization  of 


channel  capacity  for  the  multiple  access  channel  with  feedback. 
Therefore,  the  issue  of  feedback  will  not  be  given  further 
attention  in  this  thesis. 
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1.5  The  issue  of  codebook  knowledge 

If  the  codebooks  of  each  transmitter  are  known  at  the 
receivers,  each  receiver  may  determine  jointly  the  most  likely 
code  sequences  sent  by  the  transmitters.  However,  this  joint 
decoding  effort  is  often  computationally  infeasible. 
Consequently,  the  signals  of  the  other  users  are  often  treated  as 
interfering  noise  which  are  not  estimated  for  noise  reduction. 
The  code  division  multiple  access  scheme  for  the  spread  spectrum 
channel  is  one  example  in  which  joint  decoding  is  not  performed. 
Incomplete  codebook  knowledge  also  arises  from  two  other 
situations.  First,  some  of  the  codebooks  of  the  other  users  may 
not  be  known  to  a  receiver  for  security  reasons.  Second,  jamming 
may  occur  in  the  system.  The  capacity  of  these  channels  with 
incomplete  knowledge  of  the  codebooks  is  of  great  practical 
interests.  The  characterization  of  the  capacity  region  for  such 
channels  is  the  subject  of  chapter  4. 

As  an  illustration,  consider  a  multiple  access  channel 
with  two  users  Ux  and  Uz.  Each  codeword  x  and  z"  used  by  U*  and 
Uz  respectively  are  required  to  satisfy  the  following  constraints 
vs  -n 

■£,c*(x*  ’  * nc* 

n 

cz.<z  )  *  )  «  nCz. 

in  which  the  subscript  k  denotes  the  k-th  symbol  of  a  sequence; 
and  Cy  ,  cz  are  cost  functions  on  the  alphabets  X  and  2 
respectively.  Suppose  the  receiver  of  only  knows  that  the 
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codewords  of  satisfies  the  second  constraint.  The  question 

then  is  how  the  receiver  of  should  perform  decoding. 


More  specifically,  we  assume  that  the  decoder  of  Ux  uses 
some  metric  ax^  ^  0  for  each  x  fr  X  and  y  e  Y  (Y  is  the  channel 
output  alphabet).  The  decoding  rule  is  to  choose  the  message  with 
a  codeword  that  maximizes  the  sum  of  the  metric  between  each 
symbol  in  the  codeword  and  the  corresponding  channel  output 
symbol.  The  choice  of  the  metric  a  %l^  depends  on  the  decoder's 
knowledge  (which  can  be  incorrect)  about  the  channel  or  the  other 
users  of  the  channel.  Thus  by  imposing  a  specific  structure  for 
the  decoder,  we  try  a  novel  approach  to  the  problem  of  robust 
decoding.  This  approach  generates  three  results  in  chapter  4. 


The  first  result  gives  an  achievable  rate  for  a  single 
user  D x  communicating  over  a  stationary  and  memoryless  channel 
with  channel  transition  probabilities  p(y/x).  The  decoder  uses  a 
metric  {aXj  1  which  may  be  different  from  that  used  for  maximum 
likelihood  decoding,  namely  a  ^  =  In  p(y/x).  The  result  states 
that  reliable  communication  is  achievable  for  rates  less  than 


C  =  max  I ' (X; Y) 
P(X) 


in  which 


I  '  (X; Y)  =  H(X)  +  H ( Y)  -  H'  .(XY) 

H'(XY)  =  max  H({f  1)  =  max  Z-f  In  f 


subject  to 


0  for  all  x,  y 

£  axHfx*,  aXHP«?  =  P(X=x,Y=y)  =  P(X=x) .P(Y=y/X*x) 

%  f»i  *  }  p«?  £or  a11  x 

X  “  i  P**J  for  a11  y 

The  function  I'(Z;Y)  has  the  pleasing  property  of 
0  $  I ' (X;Y)  $  I (Z; Y) 

Furthermore  I'(X;Y)  is  convex  U  in  P(Y/X).  We  conjecture  that  C 
is  also  an  upper  bound  for  the  achievable  rate,  thus  making  C  the 
capacity  of  the  channel  with  given  decoder  structure. 

The  second  result  involves  the  two-user  channel  stated  at 
the  beginning  of  this  section,  with  Uz  trying  to  jam  Ux.  Let 
la^l  be  the  metric  used  by  the  decoder  of  Ux  .  Define  the 
capacity  C  for  Ux  (which  has  a  given  decoder  structure)  as  the 
maximum  rate  that  Ux  may  transmit  reliably  for  all  codewords  of  U. 
satisfying  the  constraint 

n 

cz(z  )  $  nCz 

The  result  states  that 

C  *  min  max  I (X; Y) 

P(Z)ea  P(X)t* 


in  which  &  is  the  set  of  probability  distributions  on  the 
alphabet  Z  satisfying 


39 


p(z)  ciu>  «  ci 

and  %  is  the  set  of  probability  distributions  on  the  alphabet  X 
satisfying 

2.  P(x)  c  (x)  4  Cx 
xtX  *  * 

This  capacity  is  achieved  when  a,  is  chosen  to  be  In  P(xy)  with 

*  / 

P(xy)  =  \  P(xyz)  =  Z  P*(x)  P*(z)  P(y/xz) 

in  which  P*(X)  is  the  probability  distribution  that  achieves  the 
maximum  in  *,  and  P*  (Z)  is  the  probability  distribution  that 
achieves  the  minimum  in  *. 

The  third  result  involves  a  system  with  M  asynchonous 
users.  There  is  an  (M+l)-th  source,  which  tries  to  jam  the  M 
users.  We  assume  that  receiver  m  is  only  interested  in  the 
message  of  -user  m.  The  receiver  m  knows  completely  the 
codebooks  of  those  users  in  the  index  set  I*  c  {1,2,  The 

receiver  m  performs  joint  decoding  for  the  set  I*  ,  while 
assuming  a  metric  which  takes  into  account  the  effect  of  the 
channel  and  the  users  in  I M  =  {1,2,...,M]  -  Iw.  A  set  of 
constraints 

»  -n 

Cl(xJ)  *  £  C  (x;<4)  $  nC; 

n 

for  all  1  $  i  ^  M+l  are  placed  on  the  codewords  x ■  of  the  M 
users  and  the  jammer.  The  result  states  that  the  capacity  region 
is  given  by 
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for  all  l£»n$M+l  and  for  all 
and  w  6  jQm 
“  -O-  >n 

Thus  for  the  two-user,  no  jammer  asynchronous  multiple  access 
channel  without  joint  decoding  (I  ,*-{11,  Iz*{2^),  the  capacity 
region  A  is  given  by 

&  *  U  £ 

PU.KP^X,) 

in  which  (R,  ,Rj)«^  if 
R  ,  <  I(X, ;Y) 

R  2  <  1(X2;Y) 


A  specific  &.  is  shown  in  figure  4.5.1,  which  is  compared  with  an 
for  the  case  with  joint  decoding. 


1.6  Specific  channels  and  coding  schemes 

The  general  theories  stated  in  the  previous  sections  will 
be  applied  to  three  channels  of  practical  interest,  namely  the  OR 
channel,  the  spread  spectrum  channel  and  the  collision  channel. 
The  number  of  users  is  large,  with  only  a  small  portion  active  at 
a  time.  Due  to  the  difficulties  of  scheduling  a  large  number  of 
users  as  well  as  maintaining  synchronization,  the  users  will 
transmit  asynchronously.  Joint  decoding  is  considered  for  the  OR 
channel.  For  the  collision  channel,  there  is  hardly  any 
distinction  between  the  cases  with  and  without  joint  decoding. 
The  asynchronous  capacity  region  for  M  active  users  will  be 
derived.  The  point  of  most  interest  in  the  capacity  region  is 
where  the  rates  of  the  users  are  equal  and  maximized  (that  is, 
the  farthest  point  on  the  main  diagonal  that  is  inside  the 
capacity  region).  The  capacity  of  these  multiple  access 
channels,  defined  as  the  sum  of  the  rates  at  the  point  of 
interest,  is  derived  for  large  M. 

Specific  code  division  multiple  access  schemes  are 
proposed  for  these  channels.  Such  schemes  have  three  aspects.  The 
first  aspect  involves  the  type  of  modulation  used  (PPM  or  on-off 
keying  for  the  OR  channel;  packeting,  superpacketing  and 
interleaving  for  the  collision  channel;  and  the  use  of  PN 
sequence  for  the  spread  spectrum  channel).  The  second  aspect 
involves  the  type  of  code  used  (block,  convolutional,  rate, 
constraint  length  or  block  size).  The  third  aspect  involves  the 
type  of  decoding  algorithm  used  (sequential  or  Viterbi). 


42 


All  three  aspects  are  important  for  determining  three 
performance  measures,  namely,  error  probability,  decoding 
complexity  and  maximum  achievable  throughput,  which  can  be 
different  from  the  capacity  of  the  multiple  access  channel.  These 
three  performance  measures  are  given  extensive  treatment  in 
chapters  5,  6  and  7.  An  outline  of  the  general  approach  will  be 
given  here. 

In  general,  we  prefer  the  use  of  convolutional  codes  to 

block  codes  for  their  superior  error  performance  and  relative 

ease  of  decoding.  The  code  rate  used  is  typically  low  (1/2,  1/3 

or  even  smaller)  because  interference  from  other  users  can  be 

quite  severe.  Severe  interference  also  necessitates  the  use  of 

long  constraint  length  codes  to  improve  error  performance. 

»• 

An  upper  bound  on  throughput,  which  is  of  more  interest 
than  capacity  when  sequential  decoders  are  used,  is  the  cutoff 
rate  R#,  which  is  the  maximum  rate  of  encoding  to  guarantee  a 
bounded  average  computation  per  information  bit  for  sequential 
decoding.  The  cutoff  rate  is  computed  for  the  three  channels 
considered,  with  the  interference  of  the  other  users  treated  as 
memoryless  noise.  It  is  noteworthy  that  cutoff  rate,  unlike 
capacity,  depends  on  the  modulation  used.  (In  fact,  PPM  has  a 
higher  cutoff  rate  than  on-off  keying  for  the  OR  channel.)  The 
cutoff  rate  is  used  for  two  purposes.  First,  the  bit  error  rate 
Pfc  for  the  random  code  ensemble  can  be  conveniently  upper  bounded 
as  a  function  of  the  constraint  length  and  the  rate  of  the 
convolutional  code,  as  well  as  the  cutoff  rate.  We  expect  the 
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bit  error  prababilities  for  specific  codes  to  agree  closely  with 
this  random  coding  bound  since  for  long  constraint  length  codes, 
we  have  no  better  way  to  select  good  codes  than  to  pick  them  at 
random.  Since  the  constraint  length  is  long,  we  expect  low 
probability  of  error  due  to  path  merging  [10]  in  the  trellis.  The 
second  and  more  important  use  of  the  cutoff  rate  is  for  upper 
bounding  the  decoding  complexity  Cj  defined  as  the  average 
computation  per  node  for  sequential  decoding  over  the  random  code 
ensemble.  The  bound  shows  that  average  computation  per  node  for 
the  OR  channel  or  the  collision  channel  is  small  (less  than  10) 
for  throughput  close  to  the  cutoff  rate. 

The  obvious  question  then  is  the  maximum  sum  of  the 
throughputs  for  all  users  that  can  be  reliably  achieved  by 
convolutional  codes  with  sequential  decoding.  Let  T  be  the  total 
throughputs  of  a  specific  scheme,  with  each  user  employing  a  rate 
r  convolutional  code.  Given  the  information  rate  (T/M)  each  user 
wants  to  convey  and  the  method  of  encoding  the  information,  the 
level  of  interference  contributed  by  the  user  to  the  channel  is 
determined.  Hence  the  cutoff  rate  for  each  user  is  a  function 
R#(T,r)  of  T  and  r.  In  order  that  reliable  communication  may  be 
achieved  with  sequential  decoding,  the  rate  r  must  be  less  than 
the  cutoff  rate,  hence  r  <  R#(T,r).  For  given  r,  we  solve  for  the 
largest  T  which  satisfies  the  above  inequality.  We  may 
subsequently  maximize  T  over  convenient  values  of  r.  The  result 
is  the  maximum  sum  of  the  throughputs  for  all  users. 


*_*  ‘ 


t  o 


Expressions  in  the  form 


Pw  -  Pb(R, ,k,r)  *  Pb(T,k,r) 
i  »  Cj(Rft,r)  -  Cj (T,r) 

are  also  obtained  for  the  three  channels.  The  values  of  Pfc  and  Cj 
versus  T,  for  different  values  of  k  and  r  are  plotted  in  the 
thesis.  The  probability  of  buffer  overflow  for  sequential  decoding 
is  also  examined  for  the  OR  channel  and  the  collision  channel. 

Several  specific  issues  concerning  these  channels  arise  in 
chapters  5,6  and  7,  and  a  few  highlights  are  listed  here  before 
we  conclude  this  discussion. 

1.  For  the  spread  spectrum  channel,  using  a  short  PN  sequence 
and  a  low  rate  convolutional  encoder  achieves  a  higher 
throughput  (.721  for  capacity  and  .361  for  cutoff  rate) 
than  using  a  3ong  PN  sequence  and  a  high  rate 
convolutional  encoder. 

2.  For  the  OR  channel,  PPM  achieves  a  higher  total  cutoff 

rate  (.69)  than  on-off  keying  (.59).  We  recommend  using  a 
single  bit  input,  single  PPM  symbol  output  (that  is,  rate 
*  1  bit/1  PPM  symbol)  convolutional  code.  A  forward 

search  decoding  algorithm  described  in  chapter  5  requires 
very  low  complexity  even  at  a  high  total  throughput  of  .5. 
Nonbinary  trellises  should  not  be  used  because  they  have 
lower  throughput. 


For  the  noiseless  collision  channel,  we  recommend  using  a 
rate  1/3  convolutional  encoder  and  a  forward  search 
decoding  algorithm.  The  maximum  sum  of  throughput  is  .295. 
The  decoding  complexity  is  low  even  at  a  high  throughput 
of  .28. 


Consider  the  collision  channel  corrupted  by  Gaussian 

noise,  in  addition  to  erasure  ’noise'  due  to  collisions. 

Chapter  7  shows  that  substantial  power  saving  is  gained, 

at  hardly  any  extra  cost,  by  the  redundant  coding  that  is 

originally  intended  to  correct  erasures  due  to  packet 

collisions.  A  signal  to  noise  ratio  of  4  or  5  dB  would  be 

sufficient,  which  compares  favourably  with  the  typical 

10.5  dB  required  for  uncoded  antipodal  signaling  for  a  bit 

-6 

error  rate  of  10 
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Chapter  2  The  Asynchronous  Multiple  Access  Channel 


The  main  purpose  of  this  chapter  is  to  provide  an 
understanding  of  the  effect  of  asynchronism  on  multiple 
accessing.  We  shall  assume  that  the  codebooks  for  the  users  are 
known  to  every  receiver.  Section  2.1  gives  a  precise  definition 
of  asynchronism  for  the  M-user  multiple  access  channel  and 
characterizes  the  asynchronous  capacity  region.  Sections  2.2  and 
2.3  prove  the  direct  part  and  the  converse  of  the  coding  theorem 
respectively  for  the  two-user  case.  The  proof  of  the  theorem  for 
the  M-user  case  is  a  straight  forward  extension  of  the  result  for 
the  two-user  case.  The  subject  of  incomplete  codebook  knowledge 
is  studied  in  chapter  4.  The  characterization  and  proof  for  the 
capacity  region  of  the  asynchronous  M-user  multiple  access 
channel  with  incomplete  codebook  knowledge  are  conceptually 
straight-forward  once  the  essence  of  the  proofs  in  chapter  2  and 
4  is  well  understood. 

2.1  Modeling  and  characterization  of  the  capacity  region 

Let  there  be  M  sources  sending  independent  messages.  The 
source  i,  l^i^M  uses  a  codebook  containing  2  ~  eguiprobably 
used  codewords  each  representing  the  message 

j.£  J»{1,2, . . .  ,2  .  Each  codeword  x .  ( j  . )  *  (x . .  ( j  . ) )  ,  lsk^n, 

^  A  A.  A  A  Am 

where  x.*(j.).f.X,  the  signaling  alphabet.  Let  C  be  the  ordered 

A 


We  shall  look  at  how  asynchronism  arises  in  communication 


systems.  Each  transmitter  has  a  clock  that  runs  accurately. 
However,  the  initialization  may  be  different  for  each  clock. 
Suppose  a  standard  clock  exists.  Let  A,;  be  the  amount  of  time 
the  clock  i  is  running  ahead  of  the  standard  clock  (  a^<  0  for  a 
lag).  We  may  have  some  a  priori  knowledge  about  A.  such  as 

A.  ' 

upper  and  lower  bounds  or  more  specifically,  a  joint  probability 
density  P( A , , , . . . ,  )  which  we  shall  assume  known 
henceforth.  We  assume  that  the  A^'s  are  mutually  independent, 
thus  P(  A, ,  A4, . . . ,  AM)  =  P(Af  ).P(A2)...P(  Am).  Since  the 
channel  is  assumed  to  be  symbol  synchronous,  time  may  be 
considered  as  discrete  and  hence  the  A.'s  mav  be  assumed  to  be 
integers. 

Communication  systems  often  have  a  fuzzy  agreement  of 
time.  While  the  transmitters  may  not  be  synchronized  up  to 
microseconds,  they  should  have  an  idea  of  the  hour  or  the  day. 
Hence  P(  is  usually  nonzero  only  over  a  finite  interval.  Let 

s  ®ax  max  {  \A;  -  A  j  J  :  P(  AJ  ,P(  A  j)>0  1 

Consider  the  use  of  round-robin  TDMA  for  the  collision  channel 
with  guard  time  A*Aybetween  consecutive  transmission  intervals  of 
two  different  users.  If  the  transmission  interval  of  a  user  is 
much  larosr  than  ,  the  amount  of  overhead  due  to  guard  time 
is  negligible.  Therefore  full  channel  capacity  may  be  achieved. 
The  penalty,  however,  is  a  large  delay  and  increased  buffer 
requirement.  Thus,  the  encoder  may  be  constrained  to  have  a 


■.  /.  /, 
L-A.'.,-’.-  V-'-. 
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complexity  (including  buffering)  that  falls  far  short  of  that 

required  to  achieve  full  channel  capacity  using  round-robin  TDMA. 

Since  this  encoder  with  "limited  complexity"  operates  over  a  long 

time  interval,  the  encoder  would  have  to  be  used  "periodically". 

We  would  like  to  make  the  terms  in  quotes  precise  through  the 

following  formulation.  The  complexity  of  the  code  is  the  length 

of  the  codewords,  namely  n.  The  codebooks  are  used  periodically. 

t 

Each  source  sends  the  codewords  x.(j.,)  sequentially,  where  t  is 

t  4.  A.  ■ 

an  integer  and  j.;  is  the  t-th  message  sent  by  the  user  i  (figure 

*"  t 

2.1.1).  The  codeword  x^(j.  )  starts  at  time  0  registered  by  the 
clock  at  the  transmitter  i.  Time  for  each  transmitter  is  divided 
into  periods  of  length  n.  The  random  variable  D.  is  the  delay 
between  the  start  of  periods  for  the  standard  clock  and  the 
transmitter  i  (figure  2.1.1).  Thus  0$D.$n-l.  Consequently,  P(D.) 

is  formed  by  aliasing  P(A^),  or 
00 

P  (D  ■  *  d-)  *  2L  PfA,;  *  nt  +  d-) 

The  degree  of  asynchronism  is  totally  specified  by  the  knowledge 
of  n  and  P(D^  ),  l$i$M.  We  say  that  a  system  is  perfectly 
synchronous  if  P(D^=0)=1  for  all  l^i^M.  A  system  is  uniformly 
asynchronous  if  P(D.)  is  uniform  for  0$D^n-l.  A  system  is  very 

^  X 

asynchronous  if  it  is  uniformly  asynchronous  for  large  n.  In 
essence,  a  very  asynchronous  system  has  the  clock  at  each 
transmitter  randomly  initialized  over  a  long  period  of  time. 

However,  imposing  an  upper  bound  on  the  complexity  measure 
n  of  the  encoder  may  cause  some  problems  when  we  apply  classical 


-\ 
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Figure  2.1.1  Code  asynchronism 
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information  theory.  In  proving  the  direct  part  of  coding 
theorems,  the  error  probability  approaches  zero  as  the  complexity 
measure  n  increases  without  bound.  Thus,  a  finite  encoder 
complexity  may  give  a  nonzero  error  probability.  However,  we 
shall  assume  that  the  upper  bound  n  is  sufficiently  large  so 
that  the  error  probability  would  be  satisfactorily  small.  Yet  n 
is  not  large  enough  for  using  round-robin  TDMA  that  achieves  full 
channel  capacity.  For  systems  that  are  very  asynchronous,  the 
upper  bound  n  can  be  very  large,  in  which  case  asymptotical  zero 
error  probability  may  be  approached. 


The  channel  is  assumed  to  be  memoryless  with 
P(y/x,xr. .  .xM)  known  for  all  x,€  X,  l$i£M  and  y^Y.  We  shall 
assume  that  the  receiver  Z  is  only  interested  in  the  messages 
sent  by  'the  transmitter  Z .  The  decoder  observes  the  channel 

A  t 

output  sequence  and  concludes  that  the  sent  messages  are  . . 

One  performance  measure  of  a  coding  system  is  the 
probability  of  bit  error  for  each  receiver,  which  is  defined  as 


t  ^ 

<P,t  >  -  1/»RX  I  ) 


in  which  J ^  and  are  respectively  the  k-th  bit  of  the 

a  r  ^ 

binary  representation  of  and  .  The  capacity  region  <R. 

is  defined  as  the  set  of  all  M-tuples  (R, ,R2, . . . ,RM)  such  that 

there  exist  sequences  of  codebooks  for  each  source  which  make 
x  . 

<Ptf<4>  go- to  zero  for  all  X  and  t  as  n  increases.  Define  the 
reg  ion  fl*  by 


J 


(R , , . . . ,  Rm)  €■  K  iff  for  some 


=  ^ i )  •  p ^ )  •  •  •  )  •  p ( y/^/  •  •  •  r 

1/  {X  ^  1  J .  UU  M 


for  all 


.  c  {1,2, . . .  ,M}  and  l  ill, 

L  *** 


and  _n£  =  { 1 , 2 , . . . ,  Mj  -  SiJL 

The  major  result  of  this  chapter  is  that  for  the  very 
asynchronous  channel,  (P  =  (P  .  For  the  two-user  case,  <%.  is  given 


(R,,R iff  for  some 
P(x,,xz,y)  =  Px  (x|).P2(xi).P(y/x,x2) 


<  I  (X , ;  Y/  X2) 


<  I(X2;  Y/  X() 


R,  +  Ri<  Y) 

In  contrast  to  the  capacity  region  for  the  synchronous  multiple 
access  channel,  the  above  characterization  does  not  contain  a 
convex  hull  operation.  In  essence,  the  time  sharing  argument  for 
forming  the  convex  hull  of  the  capacity  region  is  not  permitted 
since  the  two  users  are  asynchronous.  In  a  paper  studying  the 
asynchronous  multiple  access  channel,  Cover  et.  al.  [7]  argue 
that  the-  capacity  region  is  not  affected  by  asynchronism. 
However,  the  asynchronism  they  considered  has  a  code  complexity  n 
which  is  much  larger  than  the  maximum  delay.  In  contrast,  the 


very  asynchronous  channel  we  are  studying  involves  codes  with 
complexity  much  smaller  the  the  maximum  delay. 

Section  2.2  proves  the  direct  part  (  R  C  )  for  the 
two-user  case  by  achieving  reliable  communication  for  all  points 
in  (R.  through  random  coding.  Section  2.3  proves  the  converse 
(- (R,  C.  0? )  by  showing  that  all  codes  with  rate  outside  of  has 
a  bit  error  probability  lower  bounded  above  zero. 


■AW  WW 
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2.2  Direct  part  for  the  two-user  asynchronous  channel 


The  proof  of  the  direct  part  employs  random  codes  to 
achieve  all  points  of  &  .  The  set  of  codebooks  <C,  and  £*  for  the 
two  users  are  generated  randomly  in  the  following  manner.  Each 
codebook  C  -  e  (L‘  ,  i  =  1,2  contains  the  codewords  x.(j.),  each 

„  a  a 

chosen  independently  for  the  message  l^j.^2  =  .  Each  letter 
x.*(j.)  in  the  codeword  is  chosen  independently  for  l£k$n, 
according  to  the  probability  distribution  P.(X).  Therefore 


P(U-(ji) 


For  the  sake  of  simplicity  for  the  proof,  the  encoder  has 
two  special  features.  First,  the  encoder  inserts  a  preamble  once 
every  m  messages  for  the  purpose  of  code  synchronization.  The 
preambles  used  in  the  proof  are  the  codewords  x^.(J^=l)  or 
x^(J.=2)  ,  which  are  set  aside  for  the  purpose  of  code 
synchronization.  The  preambles  are  recognized  by  special  devices 
at  the  decoder,  and  consequently  give  the  values  of  the  delays  df 
and  d^.A  description  of  this  code  synchronization  mechanism,  a 
justification  of  the  fact  that  code  synchronization  can  be 
achieved  and  a  quantitative  estimate  of  the  amount  of  preamble 
overhead  required  to  achieve  synchronization  are  given  in 
Appendix  2.1. 

The  second  special  feature  is  that  the  encoders  put  out 

codewords-  in  such  a  way  that  in  any  sequence  of  m+1  consecutive 

messages  (including  the  preambles),  none  of  the  messages  is 

t  t' 

repeated.  In  other  words,  J..*J.  for  all  t*t’  and  0$t,t’$m+l. 


The  encoder  achieves  this  by  keeping  a  record  of  the  past  m 
messages.  If  the  next  message  has  occured  in  the  past  m  messages, 
it  is  put  aside  for  a  while  and  the  next  message  is  considered. 
Provided  m  «  2.  -,  the  backlog  of  messages  should  be 
sufficiently  small  compared  with  m.  Since  there  is  no  repetition 
in  m+1  consecutive  messages,  any  two  disjoint  sets  of  code 
symbols  in  any  m+1  consecutive  codewords  for  both  users  would  be 
statistically  independent  over  the  random  code.  This  independence 
property  will  be  used  in  proving  the  direct  part  of  the  coding 
theorem.  Other  than  for  the  purpose  of  proving  the  direct  part, 
the  insurance  that  the  messages  are  not  repeated  seems 
unnecessary  for  reliable  communication. 

The  decoding  for  the  first  receiver  is  performed  as  shown 

in  figure  2.1.1.  The  first  receiver  detects  the  preambles 

•  m+i  t 

xt(Jr  ;=1)  and  x, (J,  »2)  for  the  first  user,  and  xa(J4.  =1)  for 

some  0$t<m  for  the  second  user.  (In  figure  2.2.1,  t  *  2.)  For 
the  sake  of  independence,  we  have  used  two  different  preamble 
sequences  alternately.  It  is  assumed  that  the  preamble  sequences 

•  Wtl  x 

x, (J,  ,) ,  xt (Jr  ..  ) ,  xt(J4  %)  as  well  as  the  values  D,  and  are 
detected  correctly.  This  assumption  is  justified  in  Appendix  2.1. 
Without  loss  of  generality,  Df  is  assumed  to  be  larger  than  D^. 

The  decoder  for  the  first  user  decodes  by  observing  Y*1"  ,  which 

o  vrl 

starts  as  x2  ( J  a  )  begins  transmission  and  ends  as  x 2  (JQ  ) 

terminates  transmission,  as  shown  in  the  figure.  Hence  the  length 
r 

of  Y.  isr»(m+l)n.  The  corresponding  code  symbols  for  user  1  and 

+  (m)  (m)  + 

user  2  are  denoted  respectively  as  x  (  (J  )  and  x  z  (J  2  ), 

where 
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S3 


m 


i*)+ 

j2  - 


(J,  »  J . J,  > 

,  o  >  w 

(G  3  »  *^2  /•••»  J  j  ) 


There  is  a  special  reason  why  the  channel  sequence  Y..  is  blocked 

in  such  a  manner  for  decoding  the  message  of  the  first  user.  The 

t  x 

estimation  of  j,»  depends  on  the  estimation  of  as  well  as 

.  t-i  t  tr  t-i 

3X  since  xt(j,.)  overlaps  with  xz(j4.)  and  x^j^  ).  The 

t-t  ft  t  *’ 

estimation  of  . .  in  turn  depends  on  the  estimation  of 

t 

and  j 4  ;  etc.  Thus,  the  estimation  of  the  messages  is  chained. 

» 

This  chain  is  terminated  when  the  estimation  of  j*.*  (or  at  the 
W\  1  »*1 

other  end  j* . )  depends  on  the  estimation  of  jf.  only  (or  j,  . 

0  rnf> 

only)  since  the  preamble  j,  .  (or  j,  .  .)  is  known.  Thus  the 
preambles,  besides  synchronization,  also  eliminate  a  "boundary 
effect"  in  the  process  of  estimation. 

The  decoding  function  for  the  first  user  is  defined  by 

9,(yr)  ■  j  t(M)  -  (j  -j,1  -  *  any  element  of  G|(yr) 


in  which 


/  •  •  •  1 


G,(y  )  -  {(j,*  .j’  r...,j|M)s  (x;(jrj),xl(jp,yr)eT;i 

We  shall  give  a  definition  of  T ^  later  on.  If  G((yr)  is  empty, 

then  the  decoder  signals  a  decoding  failure.  Before  giving  a 

definition  of  T*’  ,  we  would  like  to  examine 

*  *■'<"»)  ~ r  c'*’)  «»*»)♦ 

P(xJ(j|  -),xt(j4  ),y./j|  »-jt  ).  Since  the  codewords  are 

generated  independently  symbol  by  symbol  and  used  nonrepeatedly , 
and  the  channel  is  memory less,  we  have 


v.v.v.v.  <■; 


p<*,*<i,  ><*,*<jrv' > * ; '  -ia"  > 


7  (•*)+  .  .*•">  + 


in  which  the  subscript  k  denotes  the  k-th  symbol  of  a  sequence. 
Four  cases  (figure  2.2.2)  can  arise  at  the  k-th  location  (which 

is  assumed  to  be  embedded  in  the  t-th  codeword)  of  the  three 

4  im>  <*o+  r 

sequences  xf(jt..),  x^(j1  )  and  y  , 

1*  3  ,  *  3  , 

~  t  .  * 

3,  “  3i 

For  this  case,  we  would  leave 

P(X  ,*  (3  ,  ).>;(  (3,  ).y*  /),  3,  > 

as  it  is  in  the  product. 

„  .  .  .  t 

2-  3  * 

3*  »  j,* 

*•  4.  j. 

since  j  ,  ^ j  (T  and  different  codewords  are  generated 


independently, 

-f  (m)  ,^<W+  .  r  ..<*0  .  . 

p(x  ,k  (3,  >.*Jk  (3,  >.y*  /:,  3,  ) 


+  <■*' 


P(*,*  <3I°")))  •  p(x2t  ( 3  , .  ) 


,  y< 

3*  3 , 


-3. 

V  *  i* 

Similar  to  2 


■jVw-*  . 


in  which  case 


+ 

.  r  (m) 

)) 

~  <>*)+  r 

'x*t  <3,  >*y* 

It 

^  , 

* 

>*  1 

)) 

•  p(x,*  >> 

four 

cases 

mentioned  above, 

+  <*,> 


—  'r,v  o'  ,(*i)+  r  ( >*0+ 

-log- P(x,^( $  .  ),  Xj4l<jz-  )/Y-ft/jr  Di*  )  are  respectively 
H(X,XxY),  H(Xf ) +H(XjY) ,  H ( X  t  Y ) +H ( X^ ) ,  H(X, )+H(X^)+H(Y) .  An  error 
pattern  is  defined  by  specifying 


W ,  -  {  t  :  t  j  *  ,  l^t$m  1 

w2  *  (  t  •  3**  I4  3**  .  O^t^m  1 


Specifying  Wf  and  Wa  determines  which  of  the  four  cases  the  k-th 
location  of  the  r-sequences  x+ ,  xt  and  yr  fall  into.  Knowing  the 
error  pattern  Wf ,  and  the  delays  Dr ,  Dt ,  we  can  derive  K, ,  Kt, 
K^,  K^,  the  collections  of  k  that  belong  to  the  four  respective 
cases.  Thus  we  shall  use  the  notation  P(x+ ,xt ,y r/W,W1 )  instead  of 


t  *w<*>  viw+  r  <*w)  (*»Jt 

P(*T(j,  ),x*(j*  )ty-/3,  jt  )• 


We  define  T*  as  the  set  of  fe-typical  r-sequences  x+,  x2 
yr  that  have  -1/r  log  P (x*,x2,yr/W^ ,W2)  close  to  the  mean  value 
(given  W^Wj)  of  P  (x^^x^jy^/W^  'w2>  •  for  a11  possible  error 
patterns  W.,W0.  Therefore 
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t  s  {  U,  *x2  »y)  €  Xr  x  xr  x  Yr 


:  1/r  1 -log  P(x*  ,x,  ,yr/W1,w2)  - 


S 


{  1K.I  [H(X,  X2Y)  ]+  |Kj|  [H(X,  )+H(X3Y)  ]  + 

iKji [H(X,Y)+H(X2)]  +  IK^I [H(X, )+H(Xa )+H(Y) ]  ){<  6 

for  all  W,  c  {1 , . . .  ,nf}  and  w,  c  {0,1,...,  ml  ] 

where  r«(m+l)n.  Note  that  T€Y  is  a  set  of  sequences  defined  in 

terms  of  typicality,  and  has  nothing  to  do  with  the  actual 

received  sequence  or  each  instance  of  decoding.  That  is  why  we 

+  r  f 

use  the  notation  P(x. ,x4 ,y./W,W2 )  instead.  Te  is  introduced  to 
provide  a  convenient  bound  on  P(x? ,xt ,yr/W,Wt) 

For  the  code  pair  C  ■  (C(  ,C4  )  and  messages  j,  and  j 
,  the  sequence  error  probability  is  defined  as 

P  (r  i  ^  .  pi  .  l*1)  .  •  i .  i1")  .  (**)+  . 

PC,|  (C'3  ,  »3a  )  -  P(J,  *  J  ,  /j  t  rj2  ) 

Since 


t  t 


,  .  <**)  |*i>4  t  •*>  A  j.  j. 

pe„  <cO,  )  *  <P e,,  >  -  1/nR,  5.  P(j;  ) 

for  all  l$t$m,  it  is  sufficient  to  prove  the  direct  part  by 

showing  that  it  is  possible  to  make  Pe,  t  (C,j J"0.  ,  ) 

vanishingly  small  for  some  codes  C,  for  all  j  ^  and  j J”0* 

.  (»0  (mH 

Instead  of  computing  Pg,  f  (C,j,  .  ,j4..  ),  we  shall  evaluate  the 
more  easily  computable  ensemble  sequence  error  probability 


zjcwftr 
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(>*>,)  /  v,l  + 

p*" ((l)  "At  ?«-,<»*  p<.'  (c'3'  ,j>  > 

(m)  fMJ  f 

Due  to  the  symmetry  of  the  random  code,  the  fact  that  j,  and  j 

*  v0  tM)  + 

do  not  contain  repeated  messages  and  that  all  j  ,  ,  j  are 

«*n)  (M/+ 

equally  likely,  the  expectation  over  j  ,  ,  j a  may  be  dropped 


Thus 


*  • 
-  o 

(M) 

i 

.i™  '  ^ 

(h,J 

the  same  for  all  j  , 

and  3 

(C, 

.  <M)  .  <>i 0+  , 

-  CM)  <M)+ 

P*,. 

3,  O*  > 

*  P«,. 

) 

_  (*•>) 

(M>+ 

» 

t,  i 

( 

1  .j, 

-jj  )  is 

the  value 

(#«; 

(*)+ 

r  • 

O* 

.  ) ,  with 

C  treated  as  a 

source  of 

randomness. 

Over  the  random  code  and  the  random  channel,  define  the 

F  I'*’*'1  *  (X  «  Xa<3,  >r  *  )€Tfe  7 


Over  the  ensemble  of  codes,  the  sequence  error  probability  is 
then 

5.., 

«  P(  {*?  (l!W  ).ir)^T(r  5  O  ) 

J I »  ")»  7 


.*?«*>  <*>) 
O'  51  ji 


S  P(  X,+  (j;M  ),x  (j^,+  ),Yr)^  T^)  +  2L 

’  ■*  z  •?«*>•>  j  t*-;  *  •  1  *• 

*•  >1» 

•■»!•'*  jT* 

The  first  term  is  less  than  €  provided  r  is  larger  than  some  rCy 
which  is  proved  in  appendix  2.2  using  the  law  of  large  numbers. 


Next,  consider  each  term  in  the  summation 


P(F-o*>  ) 

1*  ,  U  «  1 


1  P(«:  ( j ).x, l.yH/j;-1  j'””4 


S  2  exp^-lK.I  H(X,X,Y)-  IK2|  [HU,  J+HU**)]  - 
T« 

|K, |  [H(X|Y)+H(X2)3-|K<,j  [H(X,  )+H(X2)+H(Y)]  +  r6  ) 


|Ttyl  expr  ( 


nun 


$  expi(r(H(X,XJY)+ e  ) )  exp  x  ( 


nun 


exp  <-|R,l  KX.jXjY)-  lK^I  I  (Xa;X,Y)-  IK^I  I(X,X2;Y)+2r  €•  )  } 


-  exPl(  -  lKj.1  I  (X  ,  ;1/Zz)-  |Kjl  I  (X2;  Y/X , )  -  |K^|  I  (X,X2  ;Y)+2r  6-  )  )  f 

in  which  a  follows  from  the  definition  of  F„ (M)  ~tK)+  »  and  b 
follows  from  the  definition  of  Tg - .  The  value  of  JT#r  l  in  c  is 
overbounded  in  d  by  the  following  argument.  Consider  specifically 
,  hence  |K,|  =  r,  IK,  1  =  IK,  \  «  I  K*|  =  0.  From  the 

definition  of  1^r  ,  we  have 

PU x2(j,‘w4 ).  y?  «,-«,=  /) 


*  exp  (-r  (H(X,XjY)+ e  )  )  - 


Since 


1  ^  Z  P(x  *  )  ,xa  ( )  ,y  r/W,  =Wa=  £) 

(x^,x- ,yr/W1 ,W.) 
satisfying  * 

*  Z  P(«:  (j/',*,)r*,(j,W+  )pYr/W,=WJ  =  <^) 


*  exp2(-  r(H(X,X2Y)  +  O  ) 

-  1t/|  exp2(-  r (H(X |XaY)  +  *)  ) 

As  a  consequence 

Jt/|  $  exp,.*  r  (H(X,X3Y)  +  e  )  ) 

It  follows  then 

p«,i‘£'3,  '3»  > 

S  «  ♦„li.  «PJ(-Hyi«,;Vla)-m,|l(x1;l/Xl)-|K,|I(Xlx2;y)*2rt)  a 

/  A 

*,‘w*  i,w 

-  6  +  Z-  exp  (nR  IW,|  +nR,|W3 1  ). 

w.** 

expt(-|KjI(X,?Y/Xl)-|KJU(Xi?Y/X,)- IKj|JI(X1X2;Y)  +  2rf)  b 

«€  +  Z  expj  | K,|  2&  +  IK4|  (R, -I (X,  ; Y/X2)+2  €  )  + 

*-  *  lKjl(Rj~I  (X,  ;Y/X,  )+2e)+  |K41(R,  +Rz-I  (X,X,;Y)+2e) )  c 

in  which  b  is  due  to  the  fact  that  there  are  exp 2(nR ,  lw,l+nR3l  Wj  ) 
different  possible  pairs  of  j  ^  and  j  x  for  a  given  W,  and 
Wt;  and  c  is  due  to  the  fact  that 

n  lw,l  -  lKxl  +  lx, | 

n  IW21  »  I K,l  +  IKj 


Let  R,  and  R2  satisfy  the  following  inequalities  for  some 
positive  fc>2(m+l)  e 


R  ,  <  I(x,  ;Y/Xa)  -  2  f  -  h 

R  2  <  i(x2 ;Y/x , )  -  2  e  -  s 

R  ,  +  R2  <  I(X,X2;Y)  -  2<r  -  S  * 


we  have 

Pe,,  (^3,  ) 

$  e  +  A  exp  (  lK,|2€  -  S(lKJ  +|Kj|  +  |K*|  )  ) 

Hr, 

;  »,*? 

£  €+2.  expa  (  (m+l).n.2€  -  n$  ) 


V,*\ 

:w,tf 


<  €  + 


exp2(  -n( S  -  2 (m+1) e  )  ) 


The  second  inequality  follows  from  an  obvious  upper  bound  on  /K,/ 
and  an  obvious  lower  bound  on  /Kf/+/K^/+/K y/.  Since  there  are 
only  a  finite  number  of  terms  (of  the  order  2  in  the  above 

"**  (*r\)  (M)  f" 

sum,  we  have  for  all  S>0,  PC|,  (  C,j,  , j  ^  )  goes  to  zero  for 

sufficiently  large  n.  Since 

*«•' <£)  %ft  p«.>  <c'jW  1  * p- 

as  shown  previously,  there  exists  therefore  at  least  one  code  C 
in  the  code  ensemble  that  has 

.  (ta)  ,  i»o + 

P  (C)  -  E  P  (C,j  ,j  ) 

go  to  zero  for  large  n.  Thus  all  points  in  the  open  set 

(R,  ,Rt )]  such  that 
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2.3  The  converse  for  the  two-user  asynchronous  channel 

For  each  transmission  rate  outside  the  closure  of  the 

achievable  region  R. ,  we  seek  to  lower  bound  the  error 

probability  by  a  positive  quantity  independent  of  the  code  used, 

thus  showing  that  reliable  transmission  is  not  feasible  at  these 

n*c 

rates.  The  codebooks  C^,  i=l,2  map  each  of  the  2,.  messages 
into  the  code  sequence  X.(J.).  We  shall  represent  J.  as  a 

Am 

sequence  of  NR^  bits  J.  ^  . J.  j. . . .  *  •  The  code  sequence  X. 

consists  of  N  code  symbols  X. .  .X.  . .  .X. ...  . 

X.\  X.*  X.N 

The  code  sequences  are  shifted  relatively.  Without  loss  of 

generality,  we  assume  that  the  code  sequence  X, „•  remains 

stationary  and  we  shall  just  look  at  decoding  for  the  first  user. 

This  is  allowable  because  the  transmitter-receiver  pair  for  the 

first  user  is  assumed  to  be  synchronized,  either  by  synchronizing 

both  clocks  beforehand  or  by  the  use  of  preambles.  The  code 

N 

sequence  for  the  second  user  X^  is  shifted  to  the  left  by  D 

symbols  with  respect  to  X?.  as  shown  in  figure  2.3.1.  D  is 

assumed  to  be  uniformly  distributed  for  the  integers  between  0 

|/ 

and  (N-n),  inclusive.  Thus  the  first  n  symbols  of  Xf  ,  must 
overlap  with  part  of  X^..  We  denote  the  subsequences  of  these  n 

^  h  n 

symbols  of  X,  ,  X^.  and  Y.  by  X,.,  X^.  and  Y  respectively,  as 
shown  in  figure  2.3.1. 

ft 

The  observation  of  Y  should  help  us  to  estimate  part,  if 
not  all,  of  the  information  conveyed  by  the  messages  Jr  and  Ja . 
Specifically,  we  are  interested  in  the  estimation  of  a  subset  of 
nR^  bits  of  the  set  {J^,  ,  J;(*  , . . . ,  .  We  denote  this 


Figure  2.3.1  The  overlapped  region  for  decoding 


subset  by  L  ^  =  {Lt-  (  ,  2  }  and  its  estimate  from  Y 

A  A  A  A 

by  *  •  •  •  aJ?;  1  •  Since  the  above  model  decodes 

only  for  the  first  user,  the  observation  of  does  not  provide 
much  information  about  a  fixed  due  to  the  uncertainty  of  the 

A 

relative  shifts.  However,  the  statement  about  L4  and  <P^t  >  can 
be  obtained  by  giving  user  2  treatment  identical  to  user  1.  For 
a  very  asynchronous  system,  we  shall  let  N  go  to  infinity  while 
fixing  the  value  of  n. 


The  converse  of  the  coding  theorem  states  that 


For  any  L.  £  { J.  j  - ,  J.^  . , . . . ,  . /}  ,  the  bit  error  probability 


<P 


is  lower  bounded  from  zero  if  (R  ,  ,R,  )  is  outside  A  . 


The  following  arguments  relate  the  entropy  of  given 
the  channel  output  sequence  with  the  bit  error  probability  of  the 
estimates  through  a  form  of  Fano's  inequality. 


H(L,/Y%) 


*  H ( L , /Y*L , L  2 ) 


2.3.1a 


$  H(L,/L.) 


4 


z 


) 


b 


c 


s  X  h<l,*/l,*>  d 

*Ri  A 

$  jL  He{P{L(it  f  L|<t  ))  ;  Hf(x)«-x  log  x  -  (1-x)  log  (1-x)  e 


**)  A 

5  n  R , H e (  1/nR,  Z  P(L,<Jk  *  L,tk  )  ) 


n  R,He(<Pfj,  >)  g 

^  A 

in  which  a  results  because  L|f  Y  ,  L,  is  a  Markov  chain 
(figure  2.3.2);  b  follows  because  entropy  cannot  be  decreased  by 
unconditioning;  c  is  due  to  the  fact  that  the  entropy  of  a  set 
of  random  variables  is  less  than  the  stun  of  the  entropies  of  the 
random  variables;  d  is  due  to  the  fact  that  entropy  cannot  be 
decreased  by  unconditioning;  e  is  due  to  the  usual  Fano's 
inequality  (equation  4.3.4  with  M*2  in  [9]);  f  is  due  to  the 
fact  that  He  is  a  convex  H  function;  and  g  introduces  a  new 
notation.  Thus  we  have 


Theorem  2.3.1  (Fano's  inequality) 


H(L,/Y  L2)  <  n  R  ,  He(<P*,,  >)  2.3.2  a 

Hd^/Y^L,)  $  n  Ra  H  j( <P t.2  >)  b 

H(L,L,/Y*)  *  n(R,+Ra)  He(R,/(R,+R,)  <P,.,  >  +  Ra/(R(+R?)  <P«(l  >)  C 


in  which  b  is  derived  similarly  as  a.  The  proof  of  c  is  slightly 
different,  for  which 


Figure  2.3.2  Hie  Markov  chain  process  of  encoding-decoding. 
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H(L,La/Y  ) 

«  H(L,Li/Yf'ZtL1)  2.3.3a 


A  A 


$ 

H(L, 

WA.I*,) 

b 

XT 

A  A 

A  A 

L 

**> 

H(La/L,L,)  + 

z 

t») 

Hd^/L.L,) 

c 

A 

< 

I 

*». 

2 

d 

A 

'll?*  A 

z 

Ht(P(L,.t  t  L,,* 

)) 

+  i  He(P(L2<i  f  L,(*  )) 

Xsj 

e 

nX’ 

,  H  e(l/nR,  I  P(L, 

■*»£»  A 

< 

nR 

,*♦ 

L.,*))  ♦  nR2He  (1/nR,  Z  P(L,,**  L,,*)> 

A** 

f 

-  nR,  He(<Pe#l  >)  +  nRj  Hft(<Pe,2  >) 

«  n(R  ,+Rj)  H e(R  ,/(R,  +R,  )  <Pf#,  >  + 

in  which  the  line  of  reasoning  from 
(2.3.1  a-g);  and  h  follows  from  the 
Ci  function. 

Furthermore 
H ( L , /L  t )  s  n  R  , 

H(La/L, )  *nS, 

H(L,L2  )  *  n(R  ,  +  R,) 


9 

R  j/(r#  +rj  )  <pe,»  >  >  h 

a  to  g  is  the  same  as  that  of 
fact  that  He(x)  is  a  convex 


Q  •  ES  •  D  • 


2.3.4  a 

b 

c 
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Subtracting  (2.3.2) 

from  (2. 

3.4)  gives 

n  8,  He(<Pe>,  >) 

T> 

*  n  R,  - 

I(L,  ; Y  /Lt) 

2.3.5  a 

n  R,  He(<Pt>I  >) 

£  n  R2  - 

I(L2;y7l.) 

b 

rKR.+R*}  He(R,/(R,+R,)  <Pe#(  >  +  R2/(R,+Rj)  <Pe>t  >) 

>  n(R,  +  Ra)  -  I  (L,  L2;Y*)  C 

The  following  arguments  are  mostly  concerned  with  upper 
bounding  the  values  of  the  mutual  information  in  the  above 
inequalities. 

Theorem  2.3.2  (Data  processing  theorem) 

Id-,;?" /L,)  «  I(X?  ;y7xT  ) 

I  (La;Y*/L,  )  $  U*:,*"/*?) 

KL.L.jY*)  4J  1(Z*  Z?  fYm  ) 

Proof 
Consider 

-  I<x7  L,  }Y*  /LZX?  ) 

«  I(X,n  ;Y*  /L,x7  )  +  KL,  jy'*  /L,X?  x"  )  2.3.7 

The  second  term  equals  .  zero  since  Y  provides  no  new 

information,  given  X^,  about  L,  ,  due  to  the  fact  that  Lf,  X  , 

^  ,  *  **  m  * 

Y  is  a  Markov  chain.  The  first  term  equals  I ( X ,  ;Y  /X, )  since  Lz 

m  m 

,  X,  ,  Y  is  a  Markov  chain.  Hence 


2.3.6  a 

b 

c 


I(X,  L,;Y  /LjXj  ) 


I  (X  ,  ;  Y  /X,  ) 


2.3.8 


On  the  other  hand, 


I(X?  L.jY*’  /L,xT  ) 


I  (L,  JY*  /L,*?  )  +  I(xT  ;Y*/L.L,xf  ) 


2.3.9 


in  which  the  first  term  equals 


I  (L , ;  Y  Lt  X  2  ) 


2.3.10  a 


*  I(L, ;Y  La ) 


I(L,  ;Y  /La) 


with  a  due  to  the  fact  that  for  A  independent  of  C 


I ( A;BC) 


I (A;C)  +  I ( A ; B/C ) 


I (A;B/C) 


2.3.11 


b  is  due  to  the  fact  that  mutual  information  is  never  increased 
by  providing  less  observation  and  c  follows  from  the  identity 
(2.3.11).  The  second  term  in  (2.3.9)  is  trivially  lower  bounded 
by  zero.  Hence  putting  (2,3.10)  into  (2.3.9)  gives 


I(L, ; Y  /La ) 


4  I(X?  L,;Y*  /L.x"  ) 


2.3.12 


Combining  (2.3.8)  and  (2.3.12)  gives 


I(L,;Y  /Lt) 


$  I (X,  ;  Y  /X,  ) 


which  is  (2.3.6a).  The  proofs  for  (2.3.6b  and  c)  follow  the  same 


argument. 

Q.E.D. 


Let  the  subscript  k  denote  the  k-th  symbol  of  a  sequence. 

The  following  theorem  upper  bounds  I(X,  ;  Y  /X*  )  in  terms  of 

>/  w 

the  statistics  of  the  codebooks.  Recall  that  Xrs  ,  ZJS  ,  l^s^N, 
are  the  random  variables  for  the  s-th  symbol  for  the  codewords  in 
the  codebooks  C,  and  Ca  respectively. 


Theorem  2.3.3 

■**  'X  *h 

I (X  ,  ; Y  /x,  ) 

n 

I(Xa  ; Y  /X,  ) 

**  *  <*» 
IU  ,  X,  ;Y  ) 

in  which 


$ 

£ 

< 


Z  lUw* 

TO 


l 1  <**« 


s**  /Kt  > 

«*» 

?**  /*,,*  ) 


p<<*  )  -  p<x^  ) 

w  *" 

P(X*iit  )  -  l/(N-n+l)  P(X5  s  ) 


2.3.13  a 

b 

c 


d 

e 


Remark : 


The  statistics  of  X 


is  the  same  as  X 


is  the  same  as  Xf^-,  since  we  treat 
Xf.,  as  unshifted  in  time.  Consider  the  statistics  of  X2 Since 
the  delay  is  evenly  distributed  in  the  region  0  to  N-n,  only  the 
s-th  symbol  with  k^sSN-n+k  can  fall  into  the  k-th  position,  with 
probability  l/(N-n+l).  Hence  X2^  is  a  random  variable  with 
statistics  that  eguals  the  average  statistics  of  the  code  symbols 
that  can  possibly  fall  into  the  k-th  position.  Thus  from  the 
point  of  view  of  the  first  user  at  the  k-th  position,  the  second 
user  has  statistics  equal  to  that  of  X*  defined  in  (2.3.13e). 


Proof : 


I(X,  ?Y  /X,  ) 

<» 

5  “n  »  n  /•<  ^ 

"  fzi  ^  /^i  )  2.3.14a 

*  X  [H(Tt  /».»,•».  ...Y^,  l-HUj/1,1,  I,  I,  •••*„  >]  b 
«  |  )  -  H(Yj/x”.  X~  )]  c 

'  |,  iY.’/X,!  )  d 


in  which  a,  b  and  d  are  due  to  well-known  information  theoretic 
identities.  The  first  term  in  the  square  bracket  in  b  is  upper 

m  'll 

bounded  by  H(Y^  /XJ(t  )  since  unconditioning  cannot  increase  the 
entropy.  The  second  term  in  b  equals  H(Y*  /X,(*  XI(*  )  since  the 
channel  is  memoryless.  The  equations  (2.3.13  b  and  c)  are  proved 
similarly.  Furthermore,  the  joint  probability  distribution  for 


tit"  H  -  ^  O 


*.  •.  \  \  *.  •.  \, 
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^  14  *n 

-  p<*,.*x*,»  Yt  > 

*  I  P(d)  P(Xt’ x’4Yj  /d)  2.3.15a 

-  z.  pc3>  p(x"4Y^Yr )  b 

-  Z  P(d)  P >  P<Y*  /*,*  X>,W  )  c 

-  |  P(<3>  P(X,“*  )  PU,.».J  )  P(!(>,.",  2,.,“,,  )  d 

P(X  *  )  P(7“  /X,"  X,",  ,  )  J  P(d)  PU^  )  e 

p(x,"t )  pu**  )  pfC/C  *;»  >  £ 

where 

p(X*t  )  .  2  P(d)  P(X„N,<1  ) 


in  which  a  and  c  follows  from  conditioning;  b  follows  from  the 
*"  N  •»  w 

fact  that  xt£.«xf£.,  and  x4  ax7<|+  , .  for  given  d;  d  follows  from 

*  '  w 

the  independence  of  Xr^  and  X* ;  e  follows  from  the  fact  that 
the  channel  transition  probability  is  independent  of  d  given 

H  H 

xtV'x*-fc44  '  an<^  ^  defines  the  random  variable  X*^. 

Q.E.D. 


If  we  make  N  large  while  keeping  n  constant,  i.t  follows 
readily  from  (2.3.13  e)  that  P(X*^  )  are  almost  identical  for  all 
l^k^n  since  the  boundary  effect  may  be  ignored.  Thus  we  shall 


approximate  P(X^  ^  )  by  a  limiting  distributions  (X  2  ) 

Subsequently,  (2.3.13  a,  b  and  c)  become 


N  rn 


I(X,;Y/X3)  $  I  I(XI<4  ;Yt  /X,) 


2.3.16  a 


n  N 


I(Xa  ;Y  /X,  )  $  1  I(X,;Y */*,,*> 

i-i 


<q  •»  „  21  */  -n 

I  (X  ,  X  ,  ;  Y  )  *  I  I  ( X ,  ^  X  s  ;Yj  ) 


Since  mutual  information  is  a  convex  A  function  of  the  input 

n 

probability  distributions  X,^.,  the  right  hand  side  of  (2.3.16  a) 
is  further  upper  bounded  by  n. I (X, ;Y  . /X,)  in  which 

P,(X.)  «  1/n  £  P(Xf*  ) 


P(x,xay)  *  P,(x,)  Pj(x,)  P(y/xtx,) 

Similarly,  the  right  hand  side  of  (2.3.16  c)  is  upper  bounded  by 
n.I (X,X4;Y) .  The  right  hand  side  of  (2.3.16  b)  equals 

n  T  •  V  /Y  \  ^iia  f  a  1  i  noa  r  i  i  n  D  f  A  \  Fa**  mi  1 4-  1  in  f  a  **me  V  4  aa 


■?  ■!'  ■»  «ji  f  .■ 


i'„ »f «  l  ’;j  ■'.»  •?•••;  y'.'r.1  7777  im 
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when  we  consider  decoding  Cor  user  1. 

Combining  the  chain  of  inequalities  gives 


n  R  x 

£ 

n  Rf 

-  n  I(X,;Y/X2) 

2.3.17  a 

n  R2 

He(<P,,,  >) 

n  R2 

-  n  I (X2; Y/X, ) 

b 

n(R,+R2)  H  e(R,/(R,+Rj )  <Pe>,  >  +  R*/(R,  +RS )  <PC/J  >) 

-  ,  £  n(R, +R2 )  -  n  I(X,X,;Y)  c 

Since  Hc  (x)  is  a  nondecreasing  function  in  the  interval  [0,1/2], 
we  may  express  finally  the  converse  of  the  coding  theorem  for  the 
two-user  asynchronous  multiple  access  channel  in  the  following 
form 

Theorem  2.3.4 

<Pe,  >  £  H  (l-I(Xf  ;Y/X,)/R,)  2.3.18  a 

<Pe.2  >  5-  H ,  (1-I(X2;Y/X,  )/R2)  b 

R  i/(Ri  +R j )  <Pe..  >  +  R*/(Ri  +R* )  <Pe,»  > 

*  H  l'  (1-1  (X,  X,  ;Y)/(R,+R2  )  C 

in  which  for  O^y^l,  the  function  Hft! (y)=x  such  that  He(x)=y;  and 
for  y<0,  He.,.(y)  =  0.  Hence  reliable  communication  is  not  feasible 
if 


\ 


R 


>  I (X, ; Y/Xj ) 


Rz  >  I(Zi;Y/X1) 

or 

R  ,  +  R2  >  I(X,X2;Y  ) 

for  all  mutually  independent  probability  distributions  P,(XT)  and 
P4(Xx).  Therefore,  £C 


Theorem  2.3.4  concludes  the  proof  of  the  converse. 


Appendix  2.1  Preambles  for  synchronization 


Section  2.2  assumes  that  the  preambles  can  be  detected  and 
correctly  located  in  time.  This  section  shall  give  a  description 
of  this  synchronization  mechanism  and  compute  the  probability  of 
failure  to  achieve  synchronization,  averaged  over  the  code 
ensemble  defined  in  section  2.2. 


The  synchronization  mechanism  works  as  follows.  Suppose 
the  preamble  sequence  b„  for  the  first  user  starts  at  the  s#  -th 
symbol  of  channel  output  sequence  Y  .  (Recall  that  m  is  the 
number  of  codewords  between  two  preambles,  and  n  is  the  length  of 
a  codeword.)  Let  Yts)  be  the  subsequence  of  Y  which  starts  at 

•V.'j'l 

the  s-th  symbol  and  ends  at  the  (s+n-l)-th  symbol  of  Y  .  Let 
S  -  {s  :  (b„,  y(S)  )  c  T*  T 

in  which  T™  is  the  set  of  € -typical  n-sequences  defined  by 

•n  t> 

-  {  (x,  ,  y  )  e  X,  X  Y 

:  1/n  1  -log  P(x^  ,y”)  “  H(x  ,Y)  l  <  e 

1/n  \-log  P(x7  )-log  P(y  Vh(X  ,  )-H(Y)  \  <  e  ] 

Notice  that  we  do  not  require  joint  typicality  with  the  codewords 
of  the  second  user  while  we  obtain  the  synchronization  of  the 
first  user.  The  effect  of  the  code  sequence  of  the  second  user  is 
included  in  P(Y),  which  depends  on  Pa(Xz  ). 

Synchronization  failure  for  a  given  code  C  generated  in 
random  is  defined  as  the  event 


-’v  *7 


F(C)  *  (s,  f  S)  D  (  some  s  s(  and  s  e  S  ) 

Over  the  set  of  random  codes  <£  ,  the  event  of  synchronization 
failure  is  given  by 


F(<L)  -  (  (B„,  Y(s>)  )  i  t”  )  U  (  U  (  (B„,  Yw)  )  6  )  ) 

Hence  using  the  union  bound 

»«»» 

P(F((L))  S  P(  (B,,YIS#)  )  *  T6”  )  +  Z  P(  (B„,Y„;  )  *  T  ”  ) 

5  #S* 

The  first  term  is  upper  bounded  by  €,  by  the  definition  of 
^-typical  sequences  provided  that  n  is  larger  than  some  nff .  Let 
the  subscript  k  denote  the  k-th  symbol  of  a  sequence  and  consider 


each  term  in  the  above  summation. 

P(  )  «  T«  ) 

-  )  a 

"  Z  TT  ^ 

v  *-» 

•  I  £  p(b«>-p(y«.*  > 

$  Z  exp.(  -n  (H(X, )  +  H(Y)  -e)  )  d 

V 

<  expj(  n(H(X(Y)  +  *)  ).  exp 2(  -n  (H(X()  +  H( Y)  -  e)  )  e 

-  expa(  -n  (I  (X  ,;Y)  -2  fr)  )  f 


in  which  b  is  due  to  the  facts  that  the  codewords  are  .  generated 
independently  symbol  by  symbol  and  that  the  channel  is 
memoryless;  c  is  due  to  the  facts  that  s  *  s.  and  that  the 


codewords  are  generated  independently;  the  upper  bound  in  d 
results  from  the  definition  of  T{  ;  e  is  due  to  an  upper  bound 
on  the  cardinality  of  an  € -typical  set,  derived  similarly  to 
that  in  section  2.2.  Hence 

P(  F«t)  ) 

>*-n 

ft*  Z.  exp  (  -n(I(X,;Y)  -  2e)  ) 

^  c  +  mn  expa (  -n(I(X,;Y)  -  2e)  ) 

Thus  provided 

I (X , ; Y)  -  1/n  log  (mn)  -  2e$  $ 
or 

1/n  exp,(  n(I  (X,  ;Y)  -  £  -  2e)  )  £  m 

for  some  fixed  positive  $  , 

•  n  S 

P<  F ((C)  )  S  2 

thus  synchronization  can  be  made  reliable. 

The  reliability  of  synchronization  for  the  second  user  can 
be  similarly  established. 
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Appendix  2.2  On  sets  of  6 -typical  sequences 


We  want  to  show  that  for  each  €  >o,  there  exists  an  r 

C 

such  that  for  r>r#  , 

P(  (X,+  (j^  )fX1(JaIMM’  ),Yr)  i  tI  )  $  t 

in  which 

“  {  (x  *  ,x  x  ,yr) 

:  1/r  1-log  P(x*  ,x,  ,yr/W,W7)  - 

(  IK,|  [H(X,X,Y)3  +  IK,)  [H(X,  )+H(XaY)]  + 

IK,}  (H(Z&Y)+H(Xa)]  +  ]K*I  [H(X,  )+H(Xa  )+H(Y)  ]  )j<  & 

,  for  all  W ,  c {1, . . . ,m] f  W4  c {0,1, . . . ,ml  ] 


Proofs 


Consider  the  random  variables 


z*(W,,w,)  *  -log  P(xlft  ,xak  ,y4r /W,  ,w,) 

Let  h(k)  «  i  if  k  <  K;  .  Hence  E[zt (W, , W, ) ]*H  ^  where 


H(X,XjY) 


if  i  a  1 


H(X  ,)  +  H(XjY) 


i  =  2 


H(X,Y)  +  H(XZ) 


i  -  3 


-  H(X  , )  +  H(X, )  +  H(Y) 


i  -  4 


>V r\*''Jr.*-*r,r4jr.  r.',  f..  r. 


Define  the  random  variable 


ur<w,,w2)  -  1/r  I  zt(w(,w}) 

Thus,  we  may  equivalently  express  T*  as 

«  {(x,+  ,xJ  ,yr)tX,rx  X^X  Yr:  )ur(W,,W1  )  -  u  ,(W ,  ,W,  )|  <6  for  all  W 


By  the  Chebychev  inequality, 


x< 


P(  l  Ur(W,,W?)  -  ur(w,  ,wa  )  Z6  ) 

.  i 


V**1  I;  /  e1 


4  1/r  *  £  O'w.M  /  €  * 

**» 


A  el 


where  <W  is  the  maximum  value  of  the  four  possible  values  of 
the  variances  ffzfc(v(/w4)  •  In  order  that  (x*,xz  ,yr )  4  Tfr  ,  it  is 
necessary  that  lu^W,  ,Wt )-ur(Wf  ,WZ )|  £  £  for  some  Wr,WA,  which 
occurs  with  a  probability  upper  bounded  (through  union  bounding) 
by 


,  Z  P(  I  u  (W,  ,W*) 

w-.w, 

$  G***  /  r  €  a 

W..W, 


-  Ur(w,  ,wa)  l  ^  e  ) 


a****!  * 

2  <r, 


/r  € 


_v *. 


Hence 


P(  (X,4  ,Xa/Y^)  4  T  * 

»*i+/  I 

$  2  G"***  /r  el 

.<  € 

if 

*h*>  * 

r  >  r.“  t  2  6'mjut  /  e* 


Chapter  3  The  Multiple  Access  Channel  with  a  Variable  Set  of 
Simultaneous  Users. 

3.1  The  coding  theorem 

Consider  a  communication  system  with  M  synchronous  users, 
each  generating  n-bit  packets  at  a  Poisson  rate  of^/M.  Each 
packet  must  be  received  with  a  delay  of  less  than  D  from  its 
generation  time.  For  channels  with  a  large  bandwidth  which  is 
used  by  a  large  number  of  users,  the  maximum  tolerable  delay  can 
be  a  significant  factor  in  the  design  of  the  access  scheme.  As 
shown  in  section  1.3,  the  round-robin  TDMA  scheme  would  incur  an 
average  delay  of  M/[2(l-A)]  slots.  Thus  for  large  M  or  A  close 
to  one,  the  delay  becomes  intolerable.  We  are  interested  in  the 
case  when  the  maximum  tolerable  delay  D  is  much  smaller  than 
M/[2(l- A)  ] . 

This  chapter  shows  that  imposing  the  bound  D  on  delay 
reduces  the  maximum  throughput  of  the  channel.  An  intuitive 
explanation  goes  as  follows.  Within  the  delay  D,  the  number  of 
active  users  (those  with  a  message  to  send)  is  N  =  DA  +  0(  VdX  ) . 
We  assume  that  N«  M.  Thus  the  N  active  users  would  have  to 
contend  for  the  D  {«  M)  slots.  Without  feedback,  conflict  free 
scheduling  is  impossible.  To  ensure  reliable  communication  within 
the  delay  D,  it  seems  that  an  active  user  should  transmit  the 
coded  message  as  soon  as  it  is  generated,  and  spread  the  message 
over  a  time  period  of  D.  Such  a  system  behaves  asynchronously, 
even  though  the  transmitters  are  synchronized. 
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Theorem  3.1.1  generalizes  and  makes  precise  this  intuitive 
notion  of  "asynchronism"  as  a  result  of  the  uncertainty  about  the 
set  of  active  users.  The  theorem  is  based  on  the  model  that 
exactly  N  out  of  M  synchronous  users  (N  <<  M)  are  active,  and 
that  each  of  the  M  choose  N  combinations  are  equally  likely.  We 
assume  that  there  is  a  zero  element  in  the  code  alphabet  X  (which 
is  the  same  for  all  users)  so  that  an  inactive  user  would 
transmit  the  zero  element.  The  codebook  C.  of  user  i 

consists  of  2  _  codewords.  The  effective  rate  of  each  user  is 

s.  *  (N/M)  since  a  user  is  active  with  probability  N/M. 

We  assume  that  the  channel  is  memory less  and  symmetrical 
with  respect  to  the  M  users,  that  is,  P(y/xlxt...x(()  is  the  same 
if  the  input  symbols  for  the  users  are  interchanged.  Furthermore, 
we  assume  that  the  users  are  symmetrical  in  what  they  can  do. 

The  communication  system  is  reliable  if  and  only  if 
reliable  communication  is  achievable  for  any  set  of  N  users.  The 
capacity  region  S  is  the  set  of  (sr ,st , . . . ,sH)  such  that 
reliable  communication  is  achievable  for  the  system.  The 
sum-throughput  of  the  channel  is  defined  by 

s'  -  s,  +  sa  +  ...  +  sM 

and  the  sum-capacity  is  defined  by 

s  *  ■  max  s 


S 
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The  coding  theorem  can  be  stated  as  follows. 

Theorem  3.1.1 

♦  * 
s  ■  max  I  (X  ;  Y) 

PU) 

in  which  X,  denotes  a  set  of  N  independent  and  identically 
distributed  random  variables  with  probability  distribution  P(X) 
and  Y  has  conditional  statistics  P(y/x,...xM)  where  x^.€-X, 
l^i$N,  are  the  symbols  put  on  the  channel  by  the  N  active  users. 

As  an  illustration,  consider  the  collision  channel.  Let 

*1 

P(x)  -  p  if  x  is  an  idle  and  P(x)  «  (l-p)/2  if  x  is  a  packet. 
It  can  be  readily  shown  that  for  large  n, 

N  W“*  , 

I(X  ;  Y)  «  N  (l^p)  p  n-bit 

w-i 

which  has  a  maximum  value  of  (1-1/N)-  when  p  =  1  -  1/N. 

♦  -l 

Therefore  s,-  «  e  for  large  N.  This  is  the  same  as  the  capacity 
of  the  asynchronous  multiple  access  channel  with  a  large  number 
of  users. 

Before  proving  theorem  3.1.1,  we  shall  prove  the  following 
coding  theorem  on  the  capacity  region  of  the  M  choose  N  channel. 

Theorem  3.1.2 

Let  be  the  set  of  simultaneous  users.  (Thus  "fc  {1 , ... .  ,Mj  and 
The  capacity  region  S  is  given  by 


S  *  convex  hull  0  S 


P;  (XJ,  l$i£M 

AS 

in  which  (s, ,s, , . . . ,sM)  e  S  (each  S  is  defined  by  a  set  of 
P.(X.))  if 

A*  *• 

s-  <  N/M  Kd^iY/IX,]^) 
for  all  A=  f  -Jl  ;  and  for  all  f". 

Proof 

A/ 

As  an  illustration,  figure  3.1.1  gives  S  for  the  case  of 
M*3  and  N=2.  S  is  the  intersection  of  three  pentagonal  cylinders 
(each  due  to  a  t).  Figure  3.1.2  shows  the  familiar  case  of  M=N=3. 

For  given  Y\  it  is  well  known  [16]  that  the  capacity 
region  for  the  synchronous  N-user  multiple  access  channel  is 
given  by  ft,  such  that 

A/ 

ft*  =  convex  hull  U  A 

independent  PA-(X;),  i  e  "f" 

A/ 

such  that  (R,  ,RZ, . . .  ,R„)  e  A  if 

ith 

for  all  HC'jr,  XI*  and  Each  of  these  <R  ,  extended  in 

the  M  dimensional  space  stXstX...XsM  is  a  cylinder.  Since  we 
require  reliable  communication  for  all  "'fp,  the  capacity  region  for 
the  M  choose  N  channel  is  the  intersection  of  these  cylinders, 
with  the  substitution  s.«N/M  R.. 

*.  JL 


In  other  words, 


S  =  convex  hull  U  S 


independent  P^(Z^),  ie^ 
such  that  (s^s^.-.jS^e  S  if 

for  all  /I  c  yfrf  xit  4  ,  and  Si  =  ^  -  fl ;  and  for  all  Y'  .  There  remains 
the  question  of  identifying  V".  For  the  proof  for  achievability , 
it  is  necessary  to  show  that  can  be  identified  at  a  small  cost 
of  channel  capacity.  For  the  purpose  of  identification,  each 
active  source  transmits  a  preamble  sequence  of  length  r  before 
transmitting  his  codeword.  The  preambles  are  generated  by  random 
coding  according  to  the  probability  distribution  Pa(X).  Appendix 
3.1  describes  the  identification  mechanism  and  establishes  the 
reliability  of  identification  of  the  set  of  active  users  (YO  in 
terms  of  the  number  of  users  M  and  the  length  of  the  preamble  r. 
It  is  shown  that  the  length  of  the  preamble  required  is 
proportional  to  log  M,  which  we  assume  to  be  much  smaller  than  n, 
the  length  of  the  codeword.  For  the  converse  of  the  coding 
theorem,  we  assume  the  help  of  a  genie  that  reveals  the  set  of 
active  users 


Q.E.D. 


As  a  consequence  of  the  above  characterization  of  S,  we 


■xy 


Vt'O 


Theorem  3.1.3 


The  capacity  region  S  is  convex  and  symmetrical  in  the  s^' s.  Thus 
the  siun-capacity  occurs  along  the  main  diagonal  where  the  rates 
of  the  M  users  are  all  equal. 


Proof 


The  convexity  of  S  is  established  by  time  sharing  which  is 


3.2 


Proof  of  the  direct  part 


Before  proving  the  direct  part,  we  would  like  to  show  how 
the  model  of  theorem  3.1.1  applies  for  channels  with  Poisson 
message  arrivals.  Consider  the  following  access  scheme.  Time  is 
divided  into  periods  of  length  0/2.  All  packets  generated  in  one 
period  are  to  be  transmitted  in  the  next  period.  Thus  all 
messages  are  delivered  within  a  delay  D.  The  number  of  messages  N 
generated  in  a  period  of  D/2  averages  Ad/2.  For  large  D,  N  can  be 
approximated  by  its  average  by  the  law  of  large  number. 
Furthermore,  it  is  unlikely  that  a  single  user  would  have  more 
than  one  packet  generated  in  a  period,  since  D  «  M/[2(l-A)]*  If 
a  user  has  more  than  one  packet,  it  would  treat  the  packets  as  if 
they  are  from  separate  sources.  Thus  this  scheme  can  be 
approximated  by  the  M  choose  N  model  of  theorem  3.1.1.  Each 
active  source  transmits  a  preamble  sequence  of  length  r  at  the 
beginning  of  a  period.  The  message  is  transmitted  during  the  rest 
of  the  period,  which  is  of  length  D/2  -  r  symbols. 

Consider  letting  the  X^.  '  s  in  theorem  3.1.2  be  M 
independent  and  identically  distributed  random  variables.  Since 
we  assume  that  the  channel  is  symmetrical  with  respect  to  the  M 
users,  the  quantity  1  ^/{^.]  .c  in  theorem  3.1.2  may  be 

abbreviated  for  ail  *>s  I  (XK; Y/X  ”  K)  where  K ~/n/  and  N»/^7*  We 

ft 

want  to  show  that  the  joint  with  s^  »  1/M  I (X  . ;  Y),  for  all 
l<i<M,  is  in  the  S  detuned  in  theorem  3.1.2  with  P.(X.)«P(X)  for 

K  A- 

all  i.  This  is  true  if 


K  (1/M  I(X~  ;  Y)  ) 


$  N/M  MX*  ;  Y  /  X*'*  ) 


for  all  K  such  that  l£K<N.  Equivalently,  we  want  to  show  that 
K/N  MX*Y)  <  I  (X*;Y/X  )  -  I(X^iY)  -  I  (X*~*;  Y) 


I(X  W"*;Y)  $  (N-K)/H  KX^jY) 


for  all  lcKOI.  Raplacing  M-K  by  X,  *  is  equivalent  to 
I(X%Y)  *  R/N  KX^JY; 


for  all  UX<N. 


Proof 


Consider 


D  ■  1(1  ;Y)  -  I (X  ; Y) 


-  MX; Y/X  ) 


-  -  H(X/YX*) 


•  H(X)  -  H(X/YX  ) 


Due  to  the  fact  that  conditioning  decreases  the  entropy,  H(X/YX,  ) 
decreases  as  K  increases.  Therefore  {D  is  an  increasing 
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sequence  in  K.  Thus 
I(X*;Y)  <  K/N  I(xV) 

Q.E.D. 

We  have  shown  that  the  point  with  s;  *  1/M  I  (X  ;  Y), 

l$i$M,  is  achievable.  Thus  an  achievable  sum-throughput  is 

M 

s  -  X  1/M  I(X^  ;  Y)  •  I(X  ;  Y) 

This  completes  the  proof  of  the  direct  part  of  theorem  3.1.1. 


3.3 


Proof  of  the  converse 


This  section  proves  that  reliable  communication  is  not 
possible  if  the  sum-throughput  s  is  above  the  sum-capacity 

s  ■  max  I (Z  ;  Y) 

PU) 

for  M  »  N.  We  shall  also  show  that  for  the  collision  channel 

with  N/M  *  f  Hi 

4  k-l 

s  -  k  f  (1  -  f) 

in  which  k  satisfies  the  inequalities 

1/k  *  f  *  l/(k  -  1) 

»• 

The  proof  of  the  converse  consists  of  the  following  theorems 

Theorem  3.3.1 
The  sum-capacity 

♦  A 

s  *  max  s  ■  max  max  s 

(Sj  t  •  • .  t  sM  )«S  S  (s,  i  •  • .  ;S^)fS 

The  last  equality  suggests  that  the  sum-capacity  may  be  achieved 

<v 

by  some  random  coding  scheme  without  time  sharing.  (Each  S 
corresponds  to  a  pure  strategy.) 


Proof 
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The  sum- throughput  s  is  a  linear  function  of  the  s.  's. 

A/ 

Furthermore,  S  is  the  convex  hull  of  the  S  ’  s.  Therefore  the 

/v 

maximum  of  s  over  S  is  achieved  at  the  corner  point  of  some  S. 

Q.E.D. 

Theorem  3.3.2 

s*  <  max  1/MC*  £  I({XA.l  ;  Y) 

i  * 

Proof 

a/ 

For  any  P^(X^),  l$i$M  and  any  s,,...sM  in  the  corresponding  S, 
sum  those  inequalities  with  /l =  )^  in  theorem  3.1.2,  we  have 

.  $£+  v'*  }UiX;htrY) 

Using  simple  counting  arguments,  the  left  hand  side  equals 
M 

•V*  „  C*  X  s- 

K~\ 

The  theorem  follows  by  substituting  this  inequality  in  theorem 
3.3.1. 

Q.E.D. 

What  remains  for  proving  the  converse  is  that  for  large  M, 
the  right  hand  side  of  the  inequality  of  theorem  3.3.2  becomes 
* 

max  I  (X  ;  Y) 


P(X) 
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Proof 

Treating  1/^C^  as  the  value  of  the  probability  distribution  P()^), 
we  have 

l/«cw  l  U{i-  Y  ) 

-  t  P(*)  7  /f  ) 

-  I (Z , * . .  Xm;Y  /  $  ) 

in  which  iE  is  the  set  of  all  with  /X/  -  N. 

Furthermore, 

I  (X, — XM  ;  Y/$  ) 

-  H(Y/$) 

$  H(Y)  -  H(Y/§  X,  ...XM  ) 

Now 

H(Y/£x#...Xm) 


-  Z  ,...xM)  Hil/fx,  ...  xN) 

VcJ 

xnX.YC 


The  event  fx,...xM  corresponds  to  choosing  N  variables  without 

replacement  from  the  M  variables  X,  For  large  M,  the 

probability  P(-Vx,...xM)  can  be  approximated  by  the  case  of 

choosing  with  replacement.  Thus  H(Y/3Lx. . .XM. )  can  be 

N  fJ 

approximated  by  H(Y/X.  )  in  which  X.  is  a  set  of  N  independent 
and  identically  distributed  random  variables  with  probability 
distribution 


PU)  *  1/M  Z  P-(X.) 
in 

Therefore 

I(X,...XM  ;  Y/  §  )  £  KX-  ;  Y) 

Thus  we  have 

s  <  max  I (X  ;  Y) 

P(X) 

This  completes  the  proof  of  the  converse. 


For  cases  when  N  is  not  substantially  smaller  than  M  (that 
is,  N/M  ■  f  >  0),  the  theorems  3.3.1  and  3.3.2  remain  valid. 
Define  f*N/M.  The  remainder  of  this  section  derives  an  upper 
bound  on  the  sum- throughput  for  the  packet-synchronized  collision 
channel  in  terms  of  f.  Assuming  P^(x^)  «*  p.  if  x^  is  an  idle  and 
P.  (x.)  ■  (l-p.)/2  --  if  x.  is  a  packet,  it  can  be  shown  that 

Jk  Jk  Ap  A 

-  Z  t  (1"P;  )/p/  ]  Tf  P;  n-bit/code  symbol 
**  *  7 

Substituting  this  value  into  the  inequality  of  theorem  3.3.2 
gives 


* 


!  VU-P;>/P;  1  £VPj 


s 


< 


The  right  hand  side  is  a  multilinear  function  of  the  p^.  '  s. 
Therefore,  it  has  a  maximum  which  occurs  when  a  certain  set  of 
the  p.  's  are  1  while  the  rest  of  the  p.  's  are  0.  Since  this 
function  is  symmetrical  in  the  p^.  '  s,  we  assume  that  p^*l  for 
l^i^k  and  p^*0  for  k+l$i$M.  Using  simple  combinatorial  arguments 
for  the  right  hand  side  of  the  above  inequality  gives 


S  <  k  *-JcCK-l  /mC>/ 


The  maximum  occurs  for  the  smallest  k  satisfying 


*  M-k  cn~I  >  H'tk-n)  c  M-l 


For  large  M,  and  f  that  is  not  too  close  to  zero,  it  can  be  shown 
from  the  above  inequality  that  the  optimal  value  of  k  depends 
only  on  the  value  of  f.  This  fact  will  be  of  use  later  on.  Now 

S  <  k  M-k  C  AM  C  H 

-  k  { (M-k) !/[ (N-l) !  (M-k-N+l)!]  1  .  {N!  (M-N)!/M!  } 

«  k  N  {(M-k)!/M!3  .  { (M-N) ! /(M-N-k+1 ) !] 

-  k  N  .  { (M-N)k"1  7 


for  large  M,  when  k  depends  only  on  f 


k  N/M  (1  -  (N/M)  ) 


k  f  (1-f) 


For  given  f,  the  value  of  k  that  maximizes  kf(l-f)  satisfies 
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1/k  *  f  <  l/(k-l) 

For  small  f,  we  have  kf»l,  hence  s*<  e  1  which  agrees  with  our 
previous  derivation  in  section  3.1. 

t-i 

The  value  kf  (1-f),..  has  a  special  interpretation,  which 

corresponds  to  the  throughput  of  the  following  scheme.  The  M 

users  are  divided  into  M/k  groups,  with  k  users  in  each  group. 

Each  group  uses  a  slot  in  a  round-robin  TDMA  scheme  with  M/k 

slots  per  frame.  The  k  users  in  a  group  would  contend  for  the  use 

of  the  slot.  The  throughput  of  the  channel  can  be  shown  to  be 
k-i 

kf.(l-f),  -  This  scheme,  in  fact,  bears  certain  resemblance  to 
the  urn  scheme  proposed  by  Kleinrock  and  Temini  [11]. 


Appendix  3.1  Preambles  for  user  identification 

This  appendix  shows,  using  a  random  coding  argument,  that 
the  amount  of  preamble  required  for  reliable  identification  of 
the  set  of  active  users  is  proportional  to  log  M. 

The  identification  mechanism  works  as  follows.  A  random 

r 

codebook,  with  codewords  a., ,  l^i^M,  is  generated  according  to 

the  probability  distribution  Pfc(X) .  The  user  i,  l£i$M,  is 

r 

assigned  the  preamble  a.,  which  is  transmitted  at  the  beginning 

of  a  period  if  the  user  has  a  message  to  sent.  Without  loss  of 

generality,  we  assume  that  the  set  of  active  user  is  ■ 

{1,2,...,N}.  Let  Y-  be  the  channel  output  sequence  in  the 

preamble  field.  The  receiver  i  decides  that  the  transmitter  i  is 

r  r  r  »* 

active  in  the  period  if  (a. . ,y  )  €  Tg  ,  in  which  T^.  is  the  set 
of  ^-typical  r-sequences  defined  by 

T  -  {  (x’”,yr )  €  Xrx  Y^ 

:  1/r  l -log  P(xr,yr)  -  H(X,Y)|  <  6 

1/r  l-log  P(xr)  -  H(X)  \  <  k 

1/r  |-log  P(yr)  -  H{ Y)  \  <  e  ^ 

For  the  users  in  Y'  ,  failure  of  identification  for  user  i 
.  r  r  i  r 

(l^i<N)  occurs  when  (a.  ,y.  )  $  T€ t .  Over  the  code  ensemble,  the 
probability  of  failure  of  identification  for  user  i  is  bounded  by 
k,  for  all  &>0  provided  that  r  is  larger  than  some  rfl  (N ,€-)  as  a 
result  of  the  law  of  large  numbers.  It  is  noteworthy  that  r„ 


M 


does  not  depend  on  M. 


For  the  users  not  in  Yl„  false  identification  for  user  i 

r  r  r 

(N+l$i^M)  occurs  when  (a. .. ,y , ) .€  .  Over  the  code  ensemble,  the 

A?  » 

probability  of  false  identification  for  user  i  is 


* 


P{  (Al,Yr)  e  T,r ) 


I  i  p<a*:*'yr) 


I  IT  P(a£’l)  P(yr) 

V  k* 

I  exp  (-r(H(X)-f)  -  r(H(Y)-e)  ) 

V 

exp  z (r (H(XY)  +  e) )  expa  <-r <H(X)+H(Y)-2  e ) ) 


-  exp  j  ( —  r  ( I  (X;Y)-*3  6) ) 


in  which  b  follows  from  the  facts  that  the  preambles  are 

generated  independently  symbol  by  symbol  and  that  the  channel  is 

memory less;  c  is  due  to  the  facts  that  i  ^  V'  and  that  the 

preambles  are  generated  independently;  the  upper  bound  in  d 

r 

results  from  the  definition  of  T?  . ;  e  is  due  to  an  upper  bound 
on  the  cardinality  of  an  €-typical  set  derived  similar  to  that  in 
chapter  2. 

A 

Let  Y  be  the  estimate  of  Over  the  code  ensemble,  and 
applying  the  union  bound,  we  have 
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P(f  *  f) 

$  N*+  (H-N)  expa  (  -r  (I  (X;  Y)  -  3«)  ) 

-  Nfc  +  exp  2  (  -r  ( I  (X;  Y)  -  1/r  log(M-N)  -  3  e)  ) 

Consequently,  the  amount  of  preamble  required  for  reliable 
identification  of  the  set  of  active  users  is  proportional  to  log 

M. 


Chapter  4  The  Multiple  Access  Channel  with  Incomplete  Codebook 
Knowledge 

So  far,  the  multiple  access  channel  considered  assumes 
that  the  codebooks  of  all  users  are  known  at  the  receivers.  Thus 
each  receiver  may  determine  jointly  the  most  likely  code 
sequences  sent  by  the  transmitters.  This  joint  decoding  effort, 
however,  is  often  computationally  infeasible.  Consequently,  the 
signals  of  the  other  users  are  often  treated  as  interfering  noise 
and  are  not  estimated  for  noise  reduction.  For  example,  joint 
decoding  is  not  performed  for  direct  sequence  spread  spectrum 
multiple  accessing.  Also  for  the  sake  of  secrecy,  some  receivers 
may  not  have  complete  knowledge  of  the  codebooks  of  the  other 
users.  Furthermore,  jamming  may  occur  in  the  system.  These 
circumstances  result  in  incomplete  codebook  knowledge. 

This  chapter  progressively  defines  and  develops  insight 
into  the  issue  of  incomplete  codebook  knowledge  through  a  series 
of  coding  theorems.  Section  4.1  studies  the  reliability  of 
communication  for  a  single  user  over  a  memoryless  channel  with 
stationary  statistics.  The  decoder  may  assume  a  different  channel 
statistics  in  performing  maximum  likelihood  decoding.  The 
section  ends  with  a  theorem  which  states  that  reliable 
communication  can  be  achieved  up  to  I’,  a  new  information 
theoretic  quantity  which  ,  is  a  function  of  the  codebook 
statistics,  the  actual  channel  statistics  and  the  assumed  channel 
statistics.  Section  4.2  gives  a  proof  of  this  theorem.  Section 
4.3  investigates  the  properties  (convexity  id  bounds)  of  I*. 


Section  4.4  considers  the  model  of  section  4.2,  but  with  a  jammer 
on  the  channel.  The  section  starts  with  a  discussion  on  the  type 
of  constraints  that  may  be  imposed  on  the  signals  sent  by  the 
user  and  the  jammer.  Using  the  result  of  section  4.2,  we  obtain  a 
coding  theorem  about  what  the  user  and  the  jammer  can  do 
respectively  to  ensure  or  impede  reliable  communication.  Section 
4.5  examines  the  result  of  section  4.4  in  an  asynchronous 
multiple  accessing  environment. 

The  results  in  this  chapter  are  new  in  the  following 
aspects.  Unlike  mutual  information  in  classical  information 
theory,  the  information  theoretic  quantities  in  this  chapter  take 
into  account  the  structure  of  the  decoder.  Joint  decoding  is 
restricted  by  imposing  a  specific  decoder  structure.  The  decoder 
with  incorrect  knowledge  of  the  channel  statistics  was  introduced 
in  [13],  but  without  a  detailed  study.  The  result  of  section  4.2 
is  new  and  provides  significant  insights  into  the  problem.  The 
result  of  the  single  user  single  jammer  channel  of  section  4.4 
appears  to  be  similar  to  that  of  Blackwell  ,  Breiman  and 
Thomasian  [12],  Stiglitz  [13]  as  well  as  McEliece  and  Stark  [14]. 
In  [12],  the  unknown  channel  is  assumed  memoryless  and  has 
stationary  statistics  with  the  channel  transition  probability 
distribution  in  a  certain  permissible  set  Pc .  In  [13],  the 
unknown  channel  is  assumed  memoryless  but  may  have  non-stationary 
statistics.  The  channel  transition  probability  distribution, 
which  may  vary  from  instance  to  instance,  is  in  the  permissible 
set  Pc .  Both  [12]  and  [13]  assume  that  the  input  probability 
distribution  for  the  code  symbols  for  the  user  is  in  a 


permissible  set  Ps .  Both  use  a  mini-max  argument  and  arrive  at 
the  conclusion  that  the  channel  capacity  for  the  unknown  channel 
is 

C  -  min  max  I(X;Y) 

,  P(Y/X)«PC  P(X)«PS 

There  are  two  objections  to  their  models  in  the  presence  of 
jamming.  First,  unlike  the  unknown  channel  modeled  in  [12]  and 
[13],  a  channel  with  a  jammer  may  not  be  memoryless  or  have 
stationary  statistics.  Second,  coding  or  jamming  strategies  are 
usually  constrained  on  a  per  codeword  basis  (that  is,  the  total 
cost  for  the  symbols  of  each  code  sequence  or  jamming  sequence 
must  be  less  than  a  certain  amount)  rather  than  on  a  per  symbol 
basis.  While  section  4.3  arrives  at  a  similar  statement  on  the 
capacity  of  the  channel,  the  formulation  does  not  require 
memorylessness  of  the  channel  and  adopts  constraints  for 
signaling  on  a  per  code  sequence  basis.  The  proof  of  the  coding 
theorem  gives  new  insights  on  how  to  choose  the  best  decoding 
metric  to  for  various  jamming  strategies.  The  work  of  McEliece 
and  Stark  [14]  does  not  require  memorylessness  and  stationary 
statistics  for  the  user  and  the  jammer.  The  problem  they 
considered  involves  a  two  party  game  for  mutual  information 
between  the  transmitter  and  the  jammer  and  it  is  not  clear  how  a 
coding  theorem  may  follow  from  such  a  framework. 


[»] 


4.1  Communication  with  incorrect  model  of  a  memoryless  channel 


This  section  gives  an  achievable  throughput  of  a  channel 
with  a  decoder  which  has  an  incorrect  model  of  the  channel. 
Before  stating  the  result,  we  would  like  to  examine  the  meaning 
of  "  an  incorrect  model  of  the  channel  for  the  decoder". 


The  communication  system  consists  of  a  single  source  which 

Tt  ft 

sends  one  of  the  the  2  eguiprobable  mesages  j,  l£j£2  .  Each 

■w 

message  j  is  encoded  as  x.(j),  a  sequence  of  n  code  symbols  from 

'n 

the  alphabet  X*{1,2, . . .  ,U") .  The  channel  puts  out  y-,  a  sequence 
of  n  channel  symbols  from  the  alphabet  Y*{1,2, . . . ,W} ,.  according 

m 

to  the  channel  transition  probabilities  P(y  /x.).  The  decoder 

m  m 

assumes  that  the  channel  transition  probabilities  are  P'(y-/x.)> 
The  decoder  performs  maximum  ‘likelihood  (ML)  decoding  according 
to  the  assumed  channel  transition  probabilities.  In  other  words, 

A 

the  decoder  chooses  the  estimate  j  such  that 

P'ty'/Aj))  >  p’  (y*/x~(j)) 

for  all  j.  We  shall  restrict  ourselves  to  memoryless  chanr' „s  in 
this  section.  More  generally,  the  ML  decoder  may  assume  a  metric 

axt.  between  each  code  symbol  xeX  and  each  channel  symbol  yeY. 

'  <** 

Let  n*  be  the  number  of  k's  such  that  x^(j)»x  and  (The 

subscript  k  denotes  the  k-th  letter  of  a  sequence.)  The  ML 

A 

decoder  with  A»{ax^}  gives  the  estimate  j  which  maximizes  the 
sum-metric 


for  all  j 


Denoting  P(X«x),  P(Y»y)  and  P{X«x,Y-y)  by  and 

respectively,  we  have  the  following  theorem 


Theorem  4.1 

Reliable  communication  is  achievable  if  the  rate  of  encoding  R  is 
less  than 

C  *  max  I ' (X; Y)  (nats) 

P(X) 


in  which 


I ’ (X; Y)  -  H(X)  +  H(Y)  -  Hf (XY) 


H(X)  -  2  -p  In  p 

x  * 

H(Y)  -  l  -p  In  p 
*1  1  1 

and 


H(XY) 


max  H(F)  «  max  £-flM  In  t 

F  F  1  1 


(  rm(f  )  ) 


subject  to 


N 

>  o 

for  all  x,y 

Z 

i 

c*i  ■ }  p*i 

for  all  x 

L 

X 

f*i "  £  p»-) 

for  all  y 

L 

a“> 

This  theorem  is  proved  in  the  next  section. 

The  information  theoretic  term  I'  is  analogous  to  mutual 
information  in  classical  information  theory  defined  by 


I(X;Y)  -  2  In  (p^/[pxp^3) 


The  difference  is  that  I'  is  evaluated  at  the  f^’s  (instead  of 
the  p„y s)  which  minimize  the  value  of  I,  subject  to  a  set  of 
linear  contraints  involving  p^  and  ax^ .  C  is  the  maximum  value 
of  I'  for  various  input  distributions  P(X).  The  properties  of  I' 
and  H’  will  be  studied  in  section  4.3. 


There  is  reason  to  believe  that  C  is  indeed  the  capacity 
of  the  channel  with  a  decoder  that  assumes  the  metric  A.  Further 
work  is  required  to  prove  this  conjecture. 


4.2  Proof  for  achievability 

We  shov  that  random  coding  can  achieve  reliable 

communication  for  information  rate  R  less  than  C.  Let  the  2 

codewords  of  the  codebook  C  be  chosen  independently  codeword  by 

codeword  and  letter  by  letter  according  to  the  probability 

* o 

distribution  P(X) .  Furthermore,  we  require  each  codeword  x.(j)  to 
be  in  the  set 

•n  rt 

T  €<x  »  {X  :  n(px  -  t )  $  n  x  $  n(px  +  e  )  for  all  xj 

in  which  nx  is  the  number  of  letters  in  x  that  equals  x. 
Essentially,  each  codeword  must  have  a  typical  number  of  each 
letter  in  the  input  alphabet.  Let  C  be  the  ensemble  of  such  C's. 


For  the  sake  of  symmetry  and  obtaining  a  tight  bound,  the 
channel  sequence  y.  is  observed  by  the  decoder  only  if  y  is  in 
the  set 


Te^  ■  (y  **  *  "(P.j  “  €  )  *n^*n(p^  +  4)  for  all  y] 

in  which  n ^  is  the  number  of  letters  in  y *  that  equal  y,  and 

p«»p)t.P(y/x) .  The  decoder  declares  a  decoding  failure  when  y  is 

not  in  T#t^.  It  should  be  noted  that  the  decoder  cannot  decide 

m.  .  m  .  . 

whether  y  is  in  ,  since  it  does  not  know  P(y/x).  We  assume 

'  rf\  « 1 

that  a  genie  would  "black-out"  the  channel  if  y.  is  not  in  ^  . 
Obviously,  the  genie  makes  the  probability  of  decoding  error 


worse 


i: 


The  ensemble  error  probability  is 
Pa(J*J)  »  1  P(j)  Pc(J*j)  =  P4(J+j) 

with  the  last  equality  due  to  the  symmetry  of  the  random  code  in 
the  generation  of  each  codeword. 

A' 

Consider  the  threshold  decoder  which  finds  all  j  such  that 
where 

T*  (AT")!  ^.,>.^"(^.,..,-‘11 

(Note  that  the  threshold  decoder  cannot  be  implemented  since  we 
do  not  know  P(y/x)  which  determines  the  value  of  the  threshold.) 
If  no  such  j  or  more  than  one  such  j  exist,  the  threshold  decoder 
declares  a  decoding  failure.  Otherwise,  the  decoder  would  give 

.  A  a/ 

the  estimate  j«j.  Obviously,  the  output  of  the  threshold  decoder 
equals  that  of  the  ML  decoder  (which  is  implemented  and  chooses 

A 

the  j  with  the  maximum  sum-metric)  when  the  threshold  decoder 
does  not  declare  a  decoding  failure.  Consequently,  the  error 
probability  of  the  ML  decoder  is  upper  bounded  by  that  of  the 
threshold  decoder.  Therefore,  the  ensemble  error  probability 

A 

Pc(J*j)  is  upper  bounded  by 

Pt(  {(X^j),?")  *T€"  }  U  {Y**T4fj  1 

/v  *  «  n  'n  7» 

U  {some  j* j :  (X  <j),Y  )  &  T*,+  ,  Y  €•  Tf  ]  ) 

<  P4(  (x'"(j),YT,)4  T67*  )  +  Pc(*"4t^  ) 

+  I.  Pt(  (X^jKY*)*  T6^  ,  ) 

1*1  1 


\  •«  \ 


V  N.  *, 


The  inclusion  of  Y  6  in  the  last  term  tightens  the  upper 

bound,  as  we  shall  see  later.  The  first  two  terms  are  each  less 


than  €  provided  n  is  larger  than  some  ne ,  which  is  shown  in 
appendix  4.1  using  the  law  of  large  numbers.  The  last  term  equals 


2_  £  P  U^j)  y") 

j*j  (x  (}),y  ): 

41  V 

(X  ( j)  ,y^)  e  T€if 

ti  n 

y 


2.  Pc(x‘w(j)).Pt(y‘w) 

(x*(  j)  ,y*  ) : 

O  m 

(x  ( j )  ,y  )  €  t€  r 

-r»  <*i 

y  CT^ 

5  <p  n 

“■  Pc(*  (j))-Pt(y  ) 


a 


'n  ti  'n  tr\  -yi  n  „  -n 

where  K-{(x  ,y  )  :  (x  ,y  )  €  Tf<,  ,  x  eTf/,  ,  y  eTe,^ 

2  exp  (-n(H(X)  -*)  )  exp  (~n(H(Y)  -  e)  ) 

7*1  K 


1 


$  1 K |  expJ(nR)  exp  (-n(H(X)  -*)  )  exp  (-n(H(Y)  -6)  )  d 


(Note:  Throughout  this  chapter,  exp  is  the  natural  exponential 
and  the  entropy  H  is  defined  using  natural  logarithm.)  The 
product  in  a  is  due  to  the  fact  that  y  is  independent  of  x.(j) 

V 

if  jfj  since  different  codewords  are  chosen  independently;  b 
results  from  the  fact  that  the  construction  of  the  random  code 

Ol  +*  7)  7)  ~  rf\  7\ 

requires  x  (j)  to  be  in  T^*,  ,  hence  Pc(x.(j))»0  if  x.(3)f  T<x  ;  c 
follows  from  an  upper  bound  derived  in  appendix  4.2  on  P£(xm)  and 


Pff(y  )  for  all  x.$Tgx  and  y.4T^..; 
there  are  2  codewords. 


d  follows  from  the  fact  that 


The  key  of  the  proof  of  this  section  is  the  following 
upper  bound  on  the  cardinality  of  the  set  K. 


Theorem  4.2 

For  all  €>0,  there  exists  some  nft  such  that  if  n>n# ,  then 
/K |  *  exp  (n (H' (XY)  +6)) 
in  which 

H’ (XY)  -  max  H(F)  (  F  *  {ftM  1  ) 

F  “ 

subject  to 


0 


z 

i 

\ 

L 


for  all  x,y 
for  all  x 
for  all  y 


Proof 

By  the  definition  of  K,  (jT , y^Jf  R  if 

x”%T<<x  <«>  n(px-  t  )  $  nx*  n(px+  f)  for  all  x  4.2.1a 

y^T,^  <■>  n(p  -  €■)  $  nH  $  n(p  +  e)  for  all  y 


b 
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(X  <->  n<  2  P)t)  axi)  -  6  )  *  aXj 

Z  n.,,  *  n 
*•)  llJ 


d 


Let  {nVl>  ^  be  a  set  o£  n.u  for  all  xeX  and  yeY  such  that  4.2.1  is 

7 

satisfied.  Let  N  be  the  set  of  all  possible  {n 1.  For  each 
(nvti7»  there  are 


n!  /  T[  (n  XM)  l 

0^^  000 

different  (x  ,y  ).  Consequently 


|K| 


I  nl  /  Tf  (n  )! 
n  *•}  X1 


S  N 


max  n!  /  TT  (nwlJ)  J 
h/  *«J  7 


The  number  of  partitions  of  n  into  the  {n  j's  satisfying  4. 2. Id 
equals 


w-|  uw-i 


Ciiu.i  $  n 


uW 


Thus 


|Nl  $  n 


On  the  other  hand 


ni  /  TT  (n  )» 

XiJ  V 
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-fj 


«  exp(n  H(n„  /n) ) .exp( (n-n„  )  HCn.j/Jn-n,,  ))) 


,  exp((n-n„  -nlx)  H(n  l} /(n-ntl  -n, 


.  exp((n-n„  -n„ -. .  .-n- )  H(n;(-+I)  /(n-n|(  -n„  ...-n-j))). 
exp((n-n„  -n  -. .  ,-n  )  H(n  /(n-n,,  -n,a  . .  ) ) ) 

where  H(x)  *  -x  In  x  -  (1-x)  In  (1-x) 


exp(n  { H  ( f  ,,  )  +  (l-f  ,,  )H(fu  /(1-f  ,,  ))  + 


(l-f„  -f  u )H( £ ,j  /(!-£,,  -fa  ))  + . + 


(1— f,,  “f|j  — f4>^)H(  f  ( 1“  f  ii  “ +  + 

~  )H  ( fyi^/(  l-f|(  -f  n -...-f  utW'i)  ))1)  c 


exp  (n  H(F) ) 


where 


H(F)-  2  -f^  In  f  xtJ 


in  which  a  follows  from  a  rearrangement  of  the  product;  b  follows 
from  a  bound  on  binormial  coefficients  (part  b  of  problem  5.8  in 
[9]);  c  introduces  the  V  s;  and  d  follows  from  simple  algebraic 
manipulation. 

Dividing  all  the  inequalities  in  4.2.1  by  n,  and  choosing 
a  small  enough  f  give  the  constraints 


m 

& 


*  0 


for  all  x,y 


4.2.2a 


2. 

=  z 
•) 

for  all  x 

S 

=  I 

X 

for  all  y 

T 

x1 

t«, 

a  - 

2,  aXt) 

Consequently,  the  cardinality  of  K  is  upper  bounded  by 
uw” 

n  exp  (n  max  H(F)  ) 

F  satisfying  4.2.2 

■  exp  (n  (  max  H(F)  +UW  (Inn)  /n  )  ) 

F  satisfying  4.2,2 

Thus  for  every  €  >o,  there  exist  some  ne  satisfying 
UW  (In  (n# -1 ) )  /  (n,  -  1)  <  6  <  UW  (In  n(  )  /  n, 
such  that 

1  K l  <  exp  (n  H' (XY)  +  6 ) 
for  all  n>n  0. 

Q .  E .  D . 

Summarizing  the  results  so  far,  we  have 
<  2€  +  exp  (  -n  (H(X)  +  H(Y)  -  H' (XY)  -  R  -  3e)  ) 
*  26  +  exp  (  -n  (I '  (X;  Y)  -  R  -  36)  ) 


(R  in  nats.)  Therefore,  the  ensemble  error  probability  can  be 
made  arbitrarily  small  for  large  n  provided  R<I ' (X; Y)-3£.  This 
completes  the  proof  for  theorem  4.1. 


4.3  Properties  of  I'  and  H' 


This  section  derives  convexity  properties  and  bounds 
H’ (XY)  and  I’(X;Y),  which  are  defined  by 

H'  (XY)  =  max  £  -f  In  txu.  4.3.1 

F  “  J 

I'(X:Y)  «  min  Z  f  In  (f*./  (If.  £  f  *,. ) )  4.3.2 

both  subject  to  the  same  set  of  constraints  on  F  as  follows 


£  o 

for 

all 

x,y 

4.3.3a 

l 

•) 

s  ■  3  p»i 

for 

all 

X 

b 

f’l  "  * 

for 

all 

y 

c 

z 

X1 

fv,“J *  lp’“J a 

1 

d 

Theorem  4.3.1 


H(X)  +  H(Y)  £  H* (XY)  £  H(XY) 

0  $  1 ’ (X; Y)  $  I (X; Y) 

Proof 

Ignoring  the  constraint  4.3.3d,  we  have 
H'  (XY)  $  max  Z'f.  In  f, 

T  *l  l 


subject  to 


120 


*  0 


fx  “  Pjc 

S'  pi 

Consequently 

Z  - In  £*u 
*•)  ^ 


for  all  x,y 


for  all  x  (f  x  =  2  fX(^  ) 
for  all  y  (f ^  =  Z ) 


is  maximized  when  •f(j=PjC  *Ptj  for  all  x,  y.  Thus 


H’ (XY)  $  H(X)  +  H(Y) 


On  the  other  hand,  letting  fiu  satisfies  all 

1  1 

constraints  in  4.3.3  and  gives 

^  “P*ij 

Consequently 

H'  (XT)  -  max  L  -f ,,,  In  f_^  i  H(XT) 

F  *1  1  S 

The  inequalities  0$I ' (X; Y)£I (X? Y)  ’  follows  from  the 

definition 


I'(X;Y)  =  H(X)  +  H(Y)  -  H' (XY) 


Q.E.D. 


The  non-negativity  of  I'  is  very  pleasing.  In  contrast, 


the  quantity 


1  ■  Z  P»J  ln  <P'x,  /  <P'x  P  j  >  ) 


8 


/VV*  •  v*.  v  ■  ■  • 


_•  "  •  'j  "  •  'j*.  *  •  ‘  ‘  h  *»•»*»  »  ,  »  . 
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(where  p  is  the  actual  probability  and  p*  is  the  assumed 
1  *  ) 
probability)  satisfies  I ,^I  but  does  not  possess  the  non-negative 


property. 


Theorem  4.3.2 


min  H' (XY)  =  H(XY) 


max  I  * (X;  Y)  *  I(X;Y) 


which  are  achieved  when  a-.  *  In  p„u  . 

1  1 

Proof 


From  the  previous  theorem,  we  know  that  H’ (XY)^H(XY)  for  all  A. 

We  shall  show  that  for  A«{ln  p  H ' ( XY ) *H ( XY ) .  In  the 

J 


:imization  of  2.  ,-L*aln  ,fv„  . 


we  have 


Ij  ^  ln  v*. 


^  >Tm  ^  ^/P ' 


1  *■) 


S  Ij  P*"|  ^  WP **) 


H(XY) 


in  which  the  first  inequality  follows  from  the  well-known 
identity  [9]  that 


<*.  «T.  J".  *.  ^  V  r*  .* 


Z  P*  In  (<3;/?*)  $  ?  PA-  (q;/Px-  -  1)  -  0 


and  the  second  inequality  follows  from  the  constraint  4.3.3d 

£,,i  a*i *  V‘5  a“) 

Since  letting  f*^*?*^  satisfies  the  constraints  in  the  definition 
of  H' (XY)  and  gives 

Z  "f In  f ^  =  H(XY) 


we  have 


min  H' (XY)  »  H(XY) 


Q.E.D. 


Theorem  4.3.3 


H' (XY)  is  convex  fi  in  the  probability  distribution 


Proof 


ft.  ft 

Let  F.  be  an  {fXL.]  satisfying  the  constraints  in  (4.3.3)  for 

a  ft  ' 


P={P^1- 


(The 


index 


is  a  label).  Let 


AfV  +  (1-A)F*  =  {  Xfx^  +  (1- b)  fB(j ]  .  Let  F1.  and  F*  be  the  f!  and  F*  that 
achieves  the  maximum  of  H(f! ),  and  H(F*)  respectively.  Thus 

A  max  H(F*  )  +  (1-A)  max  H(Fl)  ' 


$  H( A  F*  +  (1-  X )  F  1  ) 


since  H(F)  is  a  convex  0  function  of  F  [9],  Now  \F  '+(1- X)?* 
satisfies  the  constraints  in  (4.3.3)  for  P {A +  ( 1 - A) ^ j  , 
consequently 

H(XF'  +  (1-A)  Fl  )  $  max  H(FA) 

X 


Q.E.D. 

As  a  corollary,  Hf (XY)  is  a  convex  A  function  of  {pxj  for  fixed 
This  follows  from  the  fact  that  is  a  linear 

function  of  p^  for  given  p^,x.  Similarly,  H' (XY)  is  a  convex  f) 
function  of  {p^fXj  for  fixed 

Theorem  4.3.4 

I '  (X;  Y)  is  a  convex  U  function  of  {p^;x1  for  fixed  . 

Proof 

The  reasoning  is  similar  to  that  of  theorem  4.3.3.  Define 

Ok 

f ^p^f^/f* .  Let  F^fjC  be  an  {f^}  satisfying  the  constraints  in 
(4.3.3)  for  PijjfcMPy*}  *  with  fixed  f^p^  for  all  x.  Let  F«jfJC  and 
F^jX  denote  the  F^f)C  and  F  ^  that  achieves  the  minimum  of 
I  (F )  and  I  (F ^jx  ) '  with 

I(FU1X  )  *  2  f  -  fUIy  In  (f  V|,  /  (  Z  f^.y  f »  )) 


li 


It  is  well  known  [9]  that  I(F^X)  is  a  convex  U  function  of 
for  fixed  {fxl*  Consequently 

A  min  I(F  )  +  (1-A)  min  I  (F  *  ) 

t  '  i  7 


F 


IX 


F 

r  if  I* 


KXf^  +  (l-A)  F^,t  ) 
A 

min  I  (Fujx  ) 


V 


V 1 


in  which  F*.x  satisfies  the  constraints  in  (4.3.3)  for 
A  i  '  a 

Q.B.D. 


For  the  binary  input  binary  output  channel  (that  is 

X»Y«{0,1)),  H*  and  I'  can  be  easily  found.  Let  the  actual 

probabilities  be  p*^  and  the  assumed  probabilities  be  p’^  (that 

is  ari.*ln.p.J.  We  have  the  following  result. 

“  7 


Theorem  4.3.5 

For  the  binary  input  binary  output  channel,  H' (XY)=H(XY) 
(consequently  I • (X? Y)«I (X;Y) )  if  (p, ,P0 9 )/(PrflP0 f )  and 
(P'ltP^a  )/(P/eP'«f )  are  both  larger  than  one  or  both  smaller  than 
1.  Otherwise,  H' (XY)=H(X)+H(Y)  (consequently  I'(X;Y)=0). 

Proof 


The  maximization  for  H' (XY)  involves  four  equality  constraints 
(from  4.3.3b  and  c),  one  of  which  is  redundant,  and  the 


inequality  constraint  of  4.3.3d.  There  are  four  unknowns,  namely 
the  fc^'s.  Without  the  inequality  constraint  4.3.3d,  H' (XY)  is 
maximized  with 


£  xu  -  (I  )  (  1  ) 

I  ^*».l  "  *-A  I  1 

giving  H' (XY)=H(X)+H(Y) .  We  shall  check  whether  these  particular 
values  of  verify  the  inequality  constraint.  Substituting 

the  values  of  £*i  in  *  into  the  constraint,  we  have 

^  ^ £  111  PxM  ^  ®  XH 


X  {  (  I  Px  ,  )  (  X  px.  )  -  px  1  In  p' 
xm  Xl)  x'.jj  x<)  J  7 


Consider  the  term  for  x*0  and  y*0  in  the  outermost  summation. 

’  {  OC  *  Pc,  ^  ^ Poo  +  P 10  ^  P  o*  5  P  00 

m  {  Poo  (poo  +  P»I  +  P«o  )  +  Po/P,o  ■  Pot,  1  ln  Poo 

*  {  P«0  U  “  P„  )  +  P0|  P/o  -  Poo  1  ln  Poo 

"  {  Po,P/o  “  PooP.ll  ln  P'co 

Similarly,  the  other  terms  become  -{P«  jP^-p^p,  ,)ln  p’01 , 
-{Po  iPio~PooPu1ln-P'io  »  and  {PotP(o"PooPf  i)ln-P’,f  •  Consequently, 

'  ^  tf  xi ' Px'j 1  a,i 
■  (Po,?,.  -  p..p,,  i in  t(p;.p'„)/(p.>:.)] 

which  is  larger  than  zero  if  (p;ep'lf  J/fp^p',* )>1  and 

(PooPir  )/(PorPro)<1  *  or  (PooP'n  )/(Pof P'/o )<^  an^ 

(PooPr  r  )/(Po  fPfO  )>!•  Otherwise,  the  maximum  of  X  , ln .  f*.. 
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occurs  when  the  inequality  constraint  is  satisfied  with  equality, 


because  the  set  of  f  satisfying  the  constraints  is  convex  and 

xy 

the  function  Z  *fw„  In  fv„  is  a  convex  fl  function  of  the  fxy's. 


xy 


xy 


For  such  case,  letting  £^=5^  trivially  satisfies  the 
constraints  4.3.3a-d  (with  the  constraint  4.3.3d  being  an 
equality).  This  solution  is  unique  since  there  are  four  unknowns 
in  four  linearly  independent  equations.  Consequently 
H’ (XY)»H(XY) . 


Q.E.D. 


For  an  0  inputs  W  outputs,  channel,  there  are  UW  unknowns 
constrained  by  0+W  equality  constraints  (U+W-l  of  which  are 
linearly  independent)  and  one  inequality  constraint  (not  counting 
the  non-negative  constraints  on  the  fx 
constraint  is  verified  for 


1 


’s).  If  the  inequality 


f  ^  y  ^  PX'Vj  ) 

then  H' (XY)=H(X)+H(Y)  and  I’(X;Y)-0.  Otherwise,  Z  .-f-toln.fit.  is 
maximized  over  an  UW-(U+W)-1  dimensional  space.  Thus  H’ (XY)  can 
have  any  value  in  between  H(X)+H(Y)  and  H(XY). 


4.4  Communication  m  the  presence  of  jamming 


There  are  three  types  of  constraints  that  may  be  imposed 
on  the  user  and  the  jammer.  A  per  codeword  constraint  imposes  an 
upper  bound  on  the  cost  of  each  codeword,  that  is 

b(x**)  =  2  b(x? )  £  n  B 

■fe 

A  average  codeword  constraint  imposes  an  upper  bound  on  the 
average  cost  of  transmission,  that  is 

£  P ( j )  b(x*(j))  $  n  B 
1 

A  per  letter  input  constraint  on  a  probability  distribution  P(X) 
is  given  by 

Z  P(x)  b(x)  *  B 

The  per  codeword  constraint  is  stronger  than  the  average  codeword 
constraint  and  is  more  sensible  practically.  We  shall  show  how 
these  three  constraints  are  related  in  proving  the  following 
coding  theorem.  We  assume  the  channel  is  memoryless  with 
stationary  statistics.  For  the  sequences  x.  and  z.  sent  by  the 
user  and  the  jammer  respectively,  and  the  channel  sequence  y. 


7?  rn  *71  rr  n  'n 

p(y  /*  z  )  «  /Xlt  z  ^ ) 


Furthermore,  the  decoder  assumes  a  certain  set  of  (a^l  *or 
decoding.  The  direct  part  and  the  converse  are  stated 
respectively  as 


Theorem  4.4.1  Direct  part 


* 

Assume  that  each  codeword  x  (j)  and  the  jamming  sequence  z 
satisfy  the  per  codeword  constraints 

bx(x  )  ^  n  B^ 

and 

br(z^)  $  n  Bz 

Reliable  communication  for  the  user  is  achievable  if 

R  <  min  max  I(Z;Y) 

P(Z  )€2l  P(X)tjC 

in  which  X  is  the  set  of  probability  distributions  on  X 
satisfying 

Z.  P(x)  bX(x>  *  BX 
xtX  *  * 

and  Z  is  the  set  of  probability  distributions  on  Z  satisfying 

Z  P(z)  b  (z)  $  B  _ 
x*Z  z  z 

Theorem  4.4.2  The  converse 

•n 

Assume  that  the  codeword  sequences  x  (j)  and  the  jamming  sequence 

<r\ 

z  :  satisfy  the  average  codeword  constraints 
£  P(j)  bv(xm(j))  £  n  B^ 

Z-  P(z^)  b^z*)  $  n  Bz 


the  jammer  can  make  the  bit  error  probability  of  the  user  bounded 
above  zero  if 

R  >  min  max  I (X; Y) 

p(z>a  p(x)e^ 

in  which  Tl  and  X  are  the  same  as  that  of  theorem  4.4.1 

The  direct  part  and  the  converse  differs  mainly  in  the 

m  m 

type  of  constraints  placed  on  the  sequences  x,  and  z-.  We  shall 
use  random  coding  for  the  jamming  sequence,  according  to  the 
probability  distribution  P(Z)£2.  We  do  not  use  the  per  codeword 
constraint  for  the  converse  for  the  following  reasons.  In 
proving  the  converse,  the  error  probability  has  to  be  lower 
bounded  for  all  n.  For  small  n,  it  is  likely  that  b^z”")  may 
exceed  nBz  on  a  per  codeword  basis.  This  problem  disappears  if  we 
use  an  average  codeword  constraint  instead.  Second,  the  per 
codeword  constraint  poses  a  diophantine  problem  for  small  n  as 
illustrated  in  the  following  example.  Consider  n*l,  Z={0,1"}, 
b^(0)*0,  bx(l)=l  and  B^l/4.  Consequently,  the  only  sequence 
satisfying  the  per  codeword  constraint  is  0.  Therefore,  the 
converse  would  not  be  valid  if  it  assumes  a  per  codeword 
constraint  instead.  Third,  the  sequence  z,  which  satisfies  the 
per  codeword  constraint  is  not  memoryless,  and  the  memoryless 
property  is  required  in  proving  the  converse.  Note  that  for 
large  n,  •  there  is  little  difference  between  using  the  per 
codeword  constraint  and  the  average  codeword  constraint. 


Proof  of  the  converse 


For  the  converse,  z-  is  generated  independently  letter  by  letter 
according  to  the  probability  distribution  P(Z)«Zl.  At  the  decoder, 
we  assume  that  a  genie  informs  the  decoder  about  P(Z)  so  that  the 
decoder  would  use  the  best  metric.  Consequently,  the  channel  with 
the  jammer  is  memoryless  and  has  known  and  stationary  statistics 
to  the  decoder.  For  given  P(Z),  it  is  well  known  that  the  error 
probability  of  the  user  is  lower  bounded  above  zero  if 

R  >  max  I  (X;  Y) 

P(X) 

By  choosing  the  P(Z)  that  minimizes  the  right  hand  side  of  the 
above  inequality,  the  bit  error  probability  of  the  user  is  lower 
bounded  above  zero  if 

R  >  min  max  I (X; Y) 

P(Z)C2.  P(X)«}C 

This  completes  the  proof  of  the  converse 

Q .  E  •  D  • 


Proof  of  the  direct  part 

For  the  direct  part,  we  want  to  show  that  some  codes  with  large  n 

<n 

achieve  an  almost  zero  error  probability  for  all  z  that  satisfy 
the  per  codeword  constraint.  The  decoder  is  the  ML  decoder 
described  in  section  4.1,  which  uses  the  decoding  metric  {axt,). 
However,  the  received  sequence  y.  for  a  channel  with  a  jammer  may 


not  be  memoryless  or  has  stationary  statistics,  in  contrast  with 
the  memoryless  channel  with  stationary  statistics  considered  in 
section  4.1.  For  each  z. ,  define  the  probability  distribution 
P(Z)  such  that  P(Z=z)=nt/n  (nz  is  the  number  of  z  occuring  in  z) . 

*T\ 

Since  z  satisfies  the  per  codeword  constraint  b^(z.)<nB2,  we 
have  P(Z)e2.  We  generate  the  codewords  x  by  random  coding 
according  to  the  probability  distribution  P(X)tX  For  large  n, 
the  probability  that  a  randomly  generated  codeword  would  not 
satisfy  the  per  codeword  constraint  is  negligibly  small.  The 
direct  part  shows  that  for  large  n,  the  ensemble  error 
probability  is  almost  zero  for  each  z n  satisfying  the  per 
codeword  constraint,  provided  that  the  rare  of  transmission  is 
less  than  the  capacity. 

Similar  to  section  4.1,  the  decoder  uses  a  set  of  metric 
{aT^}  and  performs  maximum  likelihood  decoding  according  to  the 
assumed  metric.  Since  both  the  random  coding  scheme  and  the 
decoding  scheme  are  symmetrical  for  each  location  of  the 
n-sequences,  the  ensemble  error  probability  is  invariant  for 
different  permutations  of  z- .  Consequently,  the  ensemble  error 
probability  for  a  given  z.  is  the  same  as  the  ensemble  error 
probability  over  random  permutations  of  z* .  Under  random 

Tl 

permutations  of  z.,  the  channel  with  the  jammer  may  be 
treated  as  memoryless®  and  has  stationary  statistics  with  channel 
transition  probabilities  given  by  P(y/x)  =  X  P(z)  P(y /  xz)  , 

@  In  fact,  theorem  4.1  remains  valid  for  channels  with  memory,  with 
P(y/x)  denoting  the  time-averaged  transition  probabilities. 


f-  •*.  =\  ‘  ‘ 


with  P(Z)  as  defined  in  the  previous  paragraph.  The  result  of 
section  4.2  is  applicable  and  the  ensemble  error  probability  for 
large  n  is  very  small  if  R<I'(X;Y). 


Since  I’  is  a  function  of  P(X),  P(Z)  and 
(abbreviated  as  Py,P2  and  A  respectively)  we  shall  denote  I'(X;Y) 
by  G(PX,PZ,A)  instead.  The  remainder  of  this  section  investigates 
G(Px,P2,A)  for  various  strategies  of  choosing  Px  and  A  by  the 
user  and  Pz  by  the  jammer.  The  user  would  try  to  choose  Px  and  A 
to  maximize  the  minimum  of  G  for  all  strategies  Pz  chosen  by  the 
jammer.  The  jammer  would  try  to  choose  Pj,  to  minimize  the  maximum 
of  G  for  all  strategies  P^  and  A  chosen  by  the  user.  Thus  proving 
the  direct  part  is  equivalent  to  proving  the  following  theorem. 


Theorem  4.4.3 

max  min  G(PX,PZ,A) 

Px«X  , A  Pze  2 

*  min  max  G(PX,PZ,A) 
p,a  px«/(a 

-  G(P*,P*,A*) 

■  min  max  I  (X;  Y) 

pre  a 

in  which  P*  and  P£  achieve  the  mini-max  of 


I (X; Y)  and 
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Proof 


By  the  saddle  point  theorem  [18], 

max  min  G{P*,Pe,A)  =  min  max  G(Py,Pz,A) 

PxeX'k  pse2  pze7£  p 

if  and  only  if  there  exists  some  (Px,P£,k*)  such  that  for  all  Px  , 
A  and  Px  , 


G(PX,P*,A) 

$  max  G(P’,  ,P*  A')  4.3.1a 

rt  Z> 

.  P^O^A’ 

-  G(P*,P*,A*)  b 

«  min  G  (P*  ,  Pj|  ,  A* )  c 

P^  6  a 

G(P*,PZ,A*)  d 


By  theorem  4.3.2, 


with  A*={ln 


we  have 


max  max  G(P',P*,A’)  *  max  I(X:Y)| 
p;t)[  A’  P(Z)=P* 


Furthermore 


max  I (X; Y)  |  £  I(X;Y)  |  £  G(P*  ,P|,A) 

P^X  P(Z)=P*  P(Z)  =  P* 

for  all  PX,A,  with  the  last  inequality  due  to  theorem  4.3.2.  Thi 
establishes  4.3.1a  and  b. 


4.3.1c  and  d  can  be  interpreted  as  follows.  If  the 
decoding  metric  is  optimal  for  the  optimal  jamming  strategy,  then 
the  decoder  using  the  same  metric  would  perform  better  (in  the 
sense  that  G(P^,PZ,A)  would  increase)  for  any  suboptimal  jamming 
strategy.  We  shall  show  that  G (P* ,?z  , A* )*G(P* ,P* , A* )  as  follows. 
G(P^,P2,A*)  equals 

min  Z  fx y  In  ) 

xy 

subject  to 

fx  *  P*  for  all  x 

f,j  *  p^  for  all  y 

and 

L  In  p*^  *  Z  PXy  In  p*  y 

Adding 

-  Z  p-  In  p*  -  Zp„  In  p* 
x  x  *  •)  )  1 


to  both  sides  of  the  inequality  constraint,  and  using  the  fact 
that  fx=px  and  f^»p^,  the  inequality  constraint  becomes 

Z.  In  (p*  /(p*  P*  ))  >,  I  p  In  (p*  /(p*  p*  )) 

1  1  x  ^  ,  xcj  xu)  X1  1 


or  equivalently 


i 


i 


k*. 

I 

f: 

i 

rfl 

s 

/ 

i 


u 

I 
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z 

X1 


f  f,  In  (p*  /p*  )  ^  y  p*  P  In  (p*  /p*  ) 

x  jx  Vi^/x  *•  *VX  /vj 


4.3.2 


On  the  other  hand. 


I  fXM  in  (f*^  /(f*  fj  )) 


*1 


^  f*  f  •) m  ln  ) 

>  ?.  fx  In  (p*,„  /p*  ) 


4.3.3a 


") 


«)jx 


with  4.3.3b  following  from 


Z  (  1"  <P%J  /(Pi  P^  )  >  -  1"  <«,,/«*  '  j)  >  1 


2^  fj^ln  (p*xij  /(j.  ) 


-  i) 


Combining  (4.3.2)  and  (4.3.3b),  we  have 
£  fyfl4iv  In  (f^/f*  )  *  Z  PtP  (p*  „  /p*  ) 


’I 


***  'i|jfx ' * *j  *  y*-  ‘v*  •; 


-ji 


£ 

t 


It  remains  to  show  that 


K 


£  p*  p*jik  in  <p;«  /p,>  * 


Z  p*  p*  In  (p*  /p*  ) 

P^X.  VP<f/>r  'Vj  > 

G(^*,P*  A*) 


such  that 


we  have 


/  p*  In  (p*  /p*  )  +  p*  )  (p  -  p*  )  ^ 

^  **  /y  tj  1  yx  1  Kyifiz  *7/*  ’  ' 


which  reduces  to 


21  p*  p+  In  (p*  /p*  )  >  T  p*  p*  In  (p*  /p*  ) 

/y<j  ;  '  yij/x  /ytj  ' 

This  completes  the  proof  for  4.3.3b  as  well  as  theorems  4.4.3  and 


4.5 


Multiple  accessing  with  incomplete  codebook  knowledge  and 


jamming 

The  extension  of  the  results  of  the  previous  section  to 
the  multiple  access  environment  is  conceptually  straight  forward. 
The  coding  theorem  in  this  section  can  be  proved  rigorously.  The 
proof  is  omitted  since  it  gives  no  new  insight  and  is  very 
complex  in  details  and  notations. 

The  asynchronous  M-user  multiple  access  channel  with 

incomplete  codebook  knowledge  and  jamming  is  modeled  as  follows. 

Let  there  be  M  asynchronous  sources  sending  independent  messages. 

♦iR*. 

The  user  i,  l^i^M,  uses  a  codebook  C.  containing  2... 
equiprobably  used  codewords  x^(j),  l£j$2  ..We  assume  that  there 
is  an  (M+l )-th  source,  which  tries  to  jam  the  M  users.  For  the  M 
users  and  the  jammer,  we  impose  the  following  constraints 
*» 

b  *(x?  )  -  1  b  :(xA-fc)  $  n  l«i£M+  » 

We  assume  that  the  receiver  m,  l^m$M,  is  assigned  to  obtain  an 
estimate  for  the  message  of  user  m.  In  the  process,  the  receiver 
m  may  have  to  estimate  the  messages  of  the  other  users.  The 
receiver  m,  besides  knowing  the  codebook  of  the  user  m,  also 
knows  some  codebooks  of  the  other  M-l  users.  The  set  of  indices 
of  C.'s  that  are  known  to  the  receiver  m  is  defined  as  I* .  The 
receiver  m  performs  joint  decoding  for  the  messages  of  the  users 
indexed  by  IM ,  while  assuming  a  metric  which  takes  into  account 
the  effect  of  the  channel  and  the  users  indexed  by 
Im«{1,2, . . .M3-IM .  We  say  that  (R,,...,RM)  is  in  the  capacity 


v . 


L39 


region  if  there  exists  codebooks  C,,...,CM  with  small  error 
probability  at  each  receiver. 


We  have  seen  in  the  previous  section  that  by  imposing  a 
specific  structure  on  the  decoder,  the  signal  of  the  jammer  has 
the  effect  of  a  memoryless  noise  with  stationary  statistics.  Thus 
for  the  receiver  m  with  1IW1=L,  the  problem  of  decoding  the  L 
messages  is  the  same  as  the  L-user  multiple  access  channel  with 
complete  codebook  knowledge,  with  the  other  M-L  users  plus  the 
jammer  treated  as  memoryless  noise  with  stationary  statistics.  It 
can  be  shown  [17]  by  random  coding  that  for  each 


P(x  ( , . . .  ,xM^j  ,y)  »  P,(x,)...  P^hi  (*  m*i  )  P(y/X,  •  •  ) 

reliable  communication  is  achievable  at  the  receiver  m  if 

for  all  12*.  «  I*,  with  m  e  Im,  and  .  Reliable  communication 

is  achieved  for  the  system  if  the  above  inequalities  are 
satisfied  for  all  IsmsM.  The  region  defined  by  the  above 

A/ 

inequalities  (with  R.>0  for  all  i)  is  denoted  as ft .  Consequently, 
the  achievable  rate  region  ft  is  given  by 


ft  -  O 


o  ft 


The  convex  hull  is  absent  ,in  the  above  characterization  because 
the  users  are  asynchronous.  Furthermore,  the  above  union  and 

A/ 

intersection  can  be  reversed  in  order  without  changing  ft .  This 
reversibility  results  from  a  convexity  argument  and  the  saddle 


w VwN*  V-V-V* .  *v-v*v*\ 


point  theorem. 


The  converse,  namely  that  the  system  has  bit  error 
probabilities  bounded  above  zero  for  any  code  outside  the  region 
dR,  ,  can  be  proved  by  standard  techniques  [16]  and  the  techniques 
used  in  chapter  2. 

/V 

Figure  4.5.1  illustrates  the  region  (R  for  M=2  N  and- 
various  degree  of  codebook  knowledge.  Obviously,  more  codebook 

/N/ 

knowledge  results  in  a  larger  &  . 


Appendix  4.1  Upper  bound  for  probability  of  atypicality 


We  want  to  show  that  for  all  £*>0,  there  exists  some 
such  that 

Ik  'll  . 

P,.(Y  4  T,  )  <  e 

c v  ^  6,L) 

if  n>n0  . 

Proof 

T>  .  m 

-  P(Y^T^  /x”(j)e  T  ) 

*  P(y\t€*  ,X"(j)eT^  )  /  P(x”(j)  ) 

^  P(Y^  Tt^  Tt#,  )/  (1  -  e  )  provided  n>some  n 

=  1/(1  -6)  Z  P(x'"y'M) 

(X  ,  y  ):x  €T<jX  ,y 

^  1/(1  -  4  )  Z.  P(y  )  provided  n>some  n 

$  e/(l  -  e  ) 

t 

-  e 

Hence  by  choosing  t  *  c'/(l+0>  we  have 

if  n  >  max  (n* ,n" )«n 


*  *  •  •  *  »  •  *  •  •  ■  -  »  •  «  *  »  *  •  -  ’  <  *  '"«*•*  ••'•'»‘*,.',.*»*,<.  B'.'*'  .*, 
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Appendix  4.2  Upper  bounds  for  Pff(x  (j))  and  Pff(y  ) 


We  want  to  upper  bound  Pc(x°'(j))  and  P<t(y'n).  Now 


P-.U  <j>> 


P(x  /x^eTg^  ) 


P(xn  and  x"*  e  )  /  P(xhfeT^  > 


$  l/(l-fc)  PU^  and  x’^Tj,*'  ) 


IT 

-  i/(i-o  Tf  P, 

X  x 

-n-  »»*/$,- o 

£  1/(1-  e)  Tf  px 

S  l/(l-€)  exp  (-n(H(X)  +  €  Iln  px  )  ) 

x 

<  exp  (-n(H(X)  -€*)  ) 

in  which  €*  is  small,  thus  giving  the  desired  upper  bond. 

The  proof  for  the  bound 
P<E(y"'>  S  exp  (-n(H(Y)  -  €’)) 
follows  the  same  argument. 


Chapter  5  Multiple  Accessing  for  the  OR  Channel 


One  convenient  way  of  communication  over  the  satellite 
channel  [4]  or  the  optical  channel [19]  is  for  each  user  to  signal  i 
pulses.  We  shall  model  such  channels  as  the  OR  channel  defined  as 
follows.  The  channel  alphabet  is  X={0,1}.  An  idle  user  transmits 
the  idle  symbol  0.  The  channel  output  alphabet  is  Y={0,ll.  The 
channel  transitions  are  given  by  y=0  if  all  x^.=0,  otherwise  y=l. 
The  slotting  of  the  channel  may  not  be  synchronized  among  the 
users.  However,  the  channel  can  be  modeled  as  symbol  synchronous 
by  limiting  the  temporal  resolution  and  using  a  discrete  estimate 
for  the  position  of  the  pulse. 

This  chapter  investigates  the  efficiency  of  the  OR  channel 
for  multiple  accessing.  The  communication  system  has  a  large 
number  of  users,  and  only  a  small  fraction  is  transmitting  at  a 
time.  We  shall  assume  that  the  system  is  asynchronous. 

This  chapter  is  divided  as  follows.  Section  5.1  evaluates 
the  channel  capacity  and  the  cutoff  rate  of  this  channel.  For  the 
case  with  joint  decoding,  the  sum-capacity  of  one  (which  is  also 
the  sum-capacity  of  the  synchronous  channel)  can  be  achieved.  For 
the  case  without  joint  decoding,  the  sum-capacity  and  the 
sum-cutoff  rate  both  equal  In  2  (or  .69).  Section  5.2  analyses  a 
convolutional  encoding  and  decoding  scheme.  The  sum-cutoff  rate 
is  used  to  obtain  a  convenient  upper  bound  for  the  error 
probability  as  a  function  of  the  constraint  length  of  the 
convolutional  code  and  the  sum-throughput  of  the  users.  We  define 
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complexity  as  n,  the  number  of  possibly  correct  states  in  the 
trellis  in  steady  state.  Section  5.2  also  shows  ,  with  an 
independence  assumption,  that  P(n)  decreases  exponentially  in  n 
for  large  n.  Hence  the  probability  of  buffer  overflow  as  n 
exceeds  the  size  of  a  modestly  large  buffer  (that  registers  the 
set  of  possible  states)  is  almost  negligible.  This  decoding 
algorithm  shows  that  a  sum-throughput  close  to  the  sum-cutoff 
rate  of  0.69  is  achievable  in  practice. 


5.1  Capacity  and  cutoff  rate. 


For  the  asynchronous  multiple  access  channel  with  joint 
decoding,  the  capacity  region  is  given  by 


R  -  U  (R 

p,(x,  ),..., ph(xm) 

iPU,  . .  .xHy)*P,  (x, ) . .  .PM  (xM)  .P(tj/x(  ...xM) 
in  which  (R  ,  ,  ...,R^)€  ft.  if 


I  Bi  <  'lUiUiv  » 

for  all 


n  c  { i , . . . , m) ,  /i*  f 
h  *  { i , . . . ,  m  }  -  _Q 

For  l^i^M,  let  P.(x- *0)-q.  and  P. (x. «l)*l-q. .  It  can  be  readily 

*■  K  A.  A  A. 

shown  that 

"Wun  ’WJun  '  '  (  Tk  qi  ’  H<  ■Irnqi  ' 

i  tJl  iCfi 

The  right  hand  side  of  the  above  equation  can  be  interpreted  as 
the  product  of  the  probability  that  the  users  in  the  set  /I  do 
not  put  a  pulse  in  a  position  with  the  binary  entropy  of  the 
output  given  that  the  users  in  the  set  Jfl  do  not  put  a  pulse  in 

A» 

the  position.  The  region  &  (for  a  specific  set  of 
P, (X, ) , . . i , PM(XH) )  is  shown  in  figure  5.1.1  for  the  cases  of  M=2 

V 

and  M*3.  It  is  noteworthy  that  many  fc  are  required  to  form  the 
capacity  region  shown  in  figure  5.1.1. 


It  can  be  readily  shown  that  the  point  (1/M, 1/M, . . . f 1/M) 
_  «/M 

is  in  the  <N  with  q . = ( 1/2 ) . >  for  all  i.  Thus  a  sum-throughput  of 
1  is  achievable  for  the  asynchronous  OR  channel  with  joint 
decoding.  Obviously,  the  sum-throughput  cannot  exceed  1,  the 
maximum  entropy  per  letter  for  the  output  alphabet. 

For  the  case  without  joint  decoding,  the  capacity  region 

equals 

&  -  u  £ 

P,  (X  PM(XM) 

:P(x,  ...xMy )-?,(*,)  ...PHUH)P(y/x, . .  ,xM) 
in  which  (R,  ,...,R*  )  £  (ft  if 
R-  <KX.  ;Y) 


for  all  i  such  that  l<i^M.  The  capacity  region  for  the 
asynchronous  2-user  OR  channel  without  joint  decoding  is 
illustrated  in  figure  5.1.2.  It  can  be  readily  shown  that 
m  M 

I(X-;Y)  -  H(  TT  q  •  )  -  q;  H(  TT  q  •  /q  •  ) 

in  which  P(x.-0)Bq..  Consider  Rr,  the  sum  of  rates  for  the  M 

JL  M  ' 

users  with  q.-q  for  all  i. 


M  M-) 

RT  •  M  H(q  )  -  M  q  H(q  ) 


Substituting  q  *  r  and  M=l/x  in  the  above  expression  gives 


Kx2;.Y/x1r 


Figure  5.1,2  The  asynchronous  OR  channel  without  joint  decoding 


For  large  M,  the  above  expression  can  be  evaluated  using 
L'Hospital's  rule  with  x  going  to  zero,  subsequently  giving 

R_(r)  «  lim  R,(r,x)  ■  In  r  In  (1-r)  nats 
x+o 

The  maximum  of  RT  (r)  over  r  occurs  at  r*l/2  when 

RT(r)  -  In  1/2  In  1/2  nats  *  In  2  bits  *  .69  bits 

Thus  the  sum-capacity  of  the  asynchronous  OR  channel  without 
joint  decoding  is  .69. 

The  sum-cutoff  rate  for  the  asynchronous  OR  channel 
treating  the  other  users  as  memoryless  noise  happens  to  be  the 
same  as  the  capacity  of  the  channel.  In  the  evaluation  of  the 
cutoff  rate,  the  modulation  symbols  have  to  be  defined  carefully. 
The  sum-cutoff  rate  for  the  pulse  position  modulation  is  .69 
while  that  for  on-off  keying  is  .59.  The  difference  between  these 
two  sum-cutoff  rate  is  due  to  the  difference  of  the  modulation 
symbols.  For  pulse  position  modulation,  a  symbol  is  a  frame  of  N 
slots  with  a  pulse  in  one  of  the  slots.  The  set  of  modulation 
symbols  consists  of  N  different  pulse  positions.  On  the  other 
hand,  the  set  of  modulation  symbols  of  on-off  keying  is  the  set 
{0,1}  representing  the  absence  or  presence  of  a  pulse  in  a  slot. 

For  pulse  position  modulation,  we  shall  approximate  the 
other  users  as  memoryless  (slot-wise)  noise  in  the  evaluation  of 


the  cutoff  rate  for  each  user.  For  pulse  position  modulation, 


the  cutoff  rate  per  frame  for  user  i  is 

R C  »  -In  (  1  (  £  Q(x)  J P(y/x)  )*  ) 

*>  x 


nats/PPM  symbol 


in  which  Q(x)»l/N  and  y  is  an  N-dimensional  binary  vector.  Let  h 
be  the  probability  that  a  slot  contains  a  pulse  due  to  the  users 
other  than  user  i.  It  can  be  easily  shown  that  for  pulse  position 
modulation , 


M-t  */»» 

h  -  1  -  (1  -  1/N)  *  1  -  e 


1  -  e 


for  large  M  and  N;  and  T=M/N.  The  channel  transition  probability 
as  viewed  by  user  i  (with  the  memoryless  approximation)  is  given 


N-fc  t-1 

P(y/Xi)  »  (l-h)  h 


for  those  y  with  k  pulses,  one  of  which  falling  into  the  pulse 
position  of  the  code  symbol  x..  P(y/X')=0  if  there  is  no  pulse  in 
the  position  of  y  that  corresponds  to  the  pulse  position  of  x^. . 
Thus 

i  **  N-1t  fr-|  x 

R.  «  -  In  (  I  C.  (k/N  (1-h)  h  )  ) 

w  * 

hi  ,  t  2  k 

-  -  In  (  (1-h)  /<h/\J  )  2-  k  wCk  [h/(l-h))  ) 

isi 

The  summation  above  can  be  readily  shown  to  be  equal  to 


i  *-2  tJ~l 

N(N-l)  (h/[l-h] )  (l/[l-h])  +  N  <h/[l-h])  ( 1/C 1-h] ) 
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Assuming  large  N  (say  >  10),  the  second  term  in  the  above 

expression  is  negligible  compared  with  the  first  term.  Further 
algebraic  manipulation  gives 

L  -T 

Rd  *  -In  h  8}  -In  (1-e  )  nats/PPM  symbol 

The  sum-cutoff  rate  per  slot  is  defined  as 

C  -T 

R  T  =  -M/N  R.  »  -T  In  (1-e  ) 

which  is  maximized  when  T»ln  2  with 


R T  ■  -In  2  In  (1/2)  nats/slot  «  In  2  bits/slot 

Thus,  the  sum-cutoff  rate  is  the  same  as  the  sum-capacity,  with 
the  memory less  approximation. 

For  on-off  keying,  the  channel  transition  probabilities 
for  user  i  is  given  in  figure  5.1.3  (assuming  qi  =  q  for  all 
users).  The  sum-cutoff  rate  is  then 

R  ■  -M  In  (  Z  (  Z  Q(x)  J P(y/x)  )Z )  nats/slot 

0  t)  x. 

2  K-l 

■  -M  In  (1  -  2q  +  2q  +  2q(l-q)(l-q  )  ) 

M 

Substituting  q  *r  and  x  =  1/M  gives 

R*(r,x)  «  -In  (1  -  2r  +  2rJ*  +  2rX(l-r1t)  (1-r . r  X  )  )  /x 


Applying  L’Hospital's  rule  for  large  M  gives 

RT(r)  «  lim  R_(r,x)  ■  2  (1-  Jl-r  )  log  r 

x-*o  * 


bits/slot 


with  a  maximum  at  r=.421  when  R  =  .597,  which  is  less  than  that 
for  pulse  position  modulation  channel.  We  are  unable  to  give 
convincing  reasons  to  explain  the  difference. 


5.2 


Convolutional  codes  for  the  multiple  access  OR  channel 


This  section  analyses  the  throughput,  error  probability 
and  decoding  complexity  of  a  specific  coding  and  decoding  scheme. 
Each  user  is  assigned  a  binary  input  N-ary  output  convolutional 
encoder  as  shown  in  figure  1.2.1.  The  tap  gains  a^.'s  are  integers 
chosen  in  the  set  {0,1,2, ... fN-l) .  Decoding  is  performed  as 
follows.  We  assume  that  the  transmitter-receiver  pair  is 
code-synchronized.  Thus  the  code  sequence  sent  by  the  transmitter 
traverses  a  path  in  the  binary  trellis  (the  solid  line  in  figure 
5.2.1).  The  pulses  of  the  other  users  may  turn  on  branches  from 
the  correct  path.  The  state  at  the  end  of  a  turned-on  branch  is 
stored  in  a  buffer,  together  with  the  path  that  leads  to  the 
state.  We  call  such  a  state  an  active  state.  A  path  may  be 
further  extended  if  there  is  a  channel  pulse  at  the  pulse 
position  associated  with  the  two  branches  from  the  active  state. 


Decoding  errors  may  result  when  an  incorrect  path  merges 
with  the  correct  path.  The  ensemble  average  of  the  expected 
number  of  bit  errors  for  an  error  event  starting  at  a  given  time 


is  upper  bounded  in  [10]  by 
P.  <  2  /  [1  -  2  ) 


-T  k  -r  a 

-  (1  -  e  )  /  (2  e  -  1) 


i  -T 

in  which  k  is  the  constraint  length  and  R.  =  -log2(l-e  ) 
bits/PPM  symbol  which  is  derived  in  section  5.1  using  the 
memoryless  noise  assumption.  (Note:  the  rate  of  the  encoder  is  1 


number  of  active  state 
at  a  time  =>  4 


correct  path 
incorrect  path 

active  state 


Figure  5.2.1  The  decoding  algorithm 
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bit/PPM  symbol  because  the  tree  is  binary.)  This  upper  bound  is 
plotted  in  figure  5.2.2  as  a  function  of  the  sum-throughput  T  for 
various  values  of  k.  From  the  plot,  it  is  necessary  to  use  a 
large  k  to  make  Pfc  small. 

The  complexity  of  decoding  is  measured  by  the  random 
variable  n,  the  number  of  active  states  in  steady  state.  Assume 
that  the  decoder  has  a  buffer  that  can  register  at  most  L  active 
states.  The  buffer  overflows  when  the  number  of  active  state 
exceeds  L,  consequently  the  correct  path  may  not  be  extended. 
Thus  a  decoding  failure  may  also  occur  when  the  buffer  overflows. 
We  shall  demonstrate  that  P(n)  decreases  exponentially  in  n  for 
large  n,  using  an  independence  assumption.  Hence  the  probability 
of  buffer  overflow  can  be  made  very  small  using  a  modestly  large 
buffer.-  In  contrast,  the  probability  of  buffer  overflow  for 
sequential  decoding  on  typical  channel  decreases  like  a  power  of 
1/L,  which  is  much  worse  than  the  exponential  decrease  in  L  for 
our  case. 

Let  n^  be  the  number  of  active  states  at  time  j  in  the 
trellis.  The  number  of  active  states  for  the  trellis  is  upper 
bounded  by  the  number  of  active  states  when  the  trellis  is 
considered  as  a  tree,  by  double  counting  the  active  states 
resulting  from  two  merging  paths.  For  large  k,  path  merging  is 
highly  unlikely,  hence  treating  the  trellis  as  a  tree  would  give 
a  tight  bound.  Initially,  there  is  only  one  active  state,  that 
is  nj*l.  There  is  always  an  active  state  propagated  by  the 
correct  path  at  each  time.  Without  loss  of  generality,  this  state 


j.  — •  *  • 


t: 


■i 


M 
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is  labeled  1.  The  other  active  states  are  labeled  i,  2^i^n^. .  Let 

e.  .  be  the  number  of  states  at  time  j+1  that  are  activated  by 
*  *  1 

the  active  state  i  at  time  j.  Let  p  be  the  probability  that  a 

pulse  due  to  the  other  users  falls  in  any  slot.  (For  M  users  and 

-T 

N-ary  PPM  symbols,  recall  that  p=l-e  .  .•  ,  where  T  is  the  channel 
throughput  M/N  (bits/slot)).  The  e.;.’s  for  given  j  are  not 
independent  since  the  e.  ,’s  depend  on  the  received  channel 

A,  } 

sequence.  However,  the  e.^.'s  would  be  very  weakly  coupled  for 

*•  •  1 

large  M  and  N,  in  the  sense  that  knowing  one  e.  .  tells  little 
about  the  other  e.-.'s.  The  e...’s  have  probability 

*'1  A,} 


distributions. 


p<»„1  -1)  -  J'p 
*2)  =  P 

and  for  2$i$n- 

P(e-  =0)  =  (1-p)1 


P(e.^  =1)  =  2p  (1-p) 


and 


P(efc,j  “2 )  =  pl 

We  shall  drop  the  superfluous  subscript  j  henceforth.  The 
probability  generating  functions  for  P(e^)  are  defined  as 

V,(s)  *  P(e.  =  k)  s  =  U"P>  S  +  p  Sa 


(1-p) •  +  2p(l-p)  s  +  p.  s 


*  [*p  +  (1-p)] 
for 

The  sequence  of  random  variables  n, ,n4 , . . . ,n^, . 
constitutes  a  discrete  time  branching  process  [15].  Furthermore 

n  i”  '  *<• 

with  probability  generating  function 


sv>,  («) 


2  P(n.„  -  k)  s 


p*  to 


Z  Z  P(n  *  */  n-,‘  *  m)  p(nT  ■  m)  s 

k:0  to:,  1  1  1 


•O  So 


Z  Z  P(n  •  *  m)  P(e  +  ...+eM  *  k)  s 

Jt-tf  * 


9 

I 

MS0 

P(n ^  *  m) 

I 

k--0 

p(e,  +  ---+em  = 

k )  s 

r* 

« 

z 

P(n •  =  m) 

TT 

V.(s)  , 

MiJ 

1 

JL 

M 

- 

I 

■to 

P(n •  =  m) 

[U 

-p)  s  +  p  s2  ] 

[sp  + 

2(to-l) 


*  s/[sp  +  (l-p)3  2.  P(n  •  =  m)  [sp  +  (1-p)] 

Mi*  A  J 


*  s/[ sp  +  (1-p)]  S  -([sp  +  (1-p)]  ) 


in  which  the  product  TT  V. (s)  results  from  the  assumption  that  the 

i  =0  1 

e^'s  are  independent  random  variables,  and  the  fact  that  the 
probability  generating  function  of  a  sum  of  independent  random 
variables  equals  the  product  of  the  probability  generating 
functions  of  the  random  variables.  In  steady  state,  dropping  the 
subscripts  j  and  j+1  gives 


S(s)  =  s/[sp  +  (1-p)]  S( [sp  +  (1-p)]  ) 


If  S(s)<«a  for  some  s>l,  P(n)  must  decreases  faster  than  s.  for 
large  n.  Therefore,  we  are  interested  in  the  value  of  S(s)  for 
s>l.  For  the  sake  of  providing  insights  about  the  function  S(s), 
we  give  a  recursive  algorithm  to  compute  S(s).  Let 

sk  *  tsi»,p + 


Thus 


<R  "  (1-P)  )  / 


The  plot  of  s  versus  sk  is  given  in  figure  5.2.3.  Thus,  we 
have  the  recursive  relation 

S(sk«  >  -  sk-  /-Fk  S<V 

z 

From  figure  5.2.3,  s^.^s^  if  l<s<(  (l-p)/p)  . .  The 

recursive  relationship  gives  the  value  of  S  for  progressively 
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larger  s,  which  inches  towards  the  value  ((l-p)/p)  as  k 
increases.  Thus,  if  we  start  with  sa=l+£  with  known  S(s0),  we  can 
compute  all  S(s^),  k=l,2,....  By  varying  the  value  of  €  (which 
is  assumed  small),  all  values  of  S(s)  in  the  open  interval 
(1 , ( (l-p)/p) 1  )  can  be  computed.  From  the  recursive  relation,  it 
is  obvious  that  S(s)  is  finite  for  s  in  this  open  interval 
provided  S(l+€)  is  finite  for  some  €>0,  since  any  s  in  the 
interval  can  be  reached  in  a  finite  number  of  steps  of  recursion. 
When  s  approaches  ((l-p)/p)  ,  the  increase  of  sfc  for  each  step  is 
small,  while  each  successive  value  of  S  increases  by  a  factor  of 
s w,/Vs£*(l-p)/p.  Therefore  S  becomes  unbounded  as  s  approaches 
<(l-p)/p)S.  The  remaining  question  is  how  to  compute.  S(l+fi-). 
Obviously  S(l)=l.  Differentiating  the  equation  for  S(s)  with 
respect  to  s,  and  putting  s=l  (so  that  S'(l)=n),  we  have 

n  ■  (l-p)/(l-2p) 

Thus  for  small  e ,  we  have 

S(l+e)  =  1  +  <(l-p)/(l-2p))  6 

Differentiating  twice  with  respect  to  s,  we  obtain 
S”(l)  =  S ’  (1 )  (  2p(  1+p)  )/  (  (l-2p)(l+2p)  ) 

Hence,  the  variance  of  n  is 

0Vl  *  n*  -  (S)2  *  (  p(l-p)  )/  (  (l-2p)*.(l+2p)  ) 

n 

The  values  of  n  and  are  plotted  in  figure  5.2.4  as  a  function 
of  the  sum- throughput  T. 


*  .  ' 
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Unfortunately,  this  algorithm  is  numerically  unstable 
Furthermore,  we  have  not  shown  that  S(l+S)  necessarily  exists 
The  following  upper  bound  on  S(s)  would  resolve  these  problems. 

Theorem  5.2.1 

Sfc(s)  $  (s*-l)s/(s*-s) 

for  s*>s£l,  s*=( (l-p)/p)1  >  1 

Proof  (by  induction) 

S( (s)  *  s  £  (s*-l)s/(s*-s) 

for  s*>s£l 

Suppose 

Sfc(s)  $  (s*-l)s/(s*-s) 
for  s*>s£l,  then 
-  (s> 

-  sp/[sp*(l-p)]  St([sp+(l-p)]1) 

$  sp/[sp+(l-p)3  (s*-l)[sp+(i-p)]2/(s*-[sp+(l-p)]2) 

-  Sp  (s*-l)  (sp+(l-p) )/[ (s*-s) (sp*+  (i-pa)3 
S  (s*-l)s/(s*-s) 


if  s*>s  and 


■w 


p(sp+(l-p) )/(sp  +(l-p  ))  $  1 


(  <=>  p  $  1  ) 


Hence  by  induction,  the  theorem  is  true 


Q  .  E  •  D  • 


Moreover,  (s)  is  monotonically  increasing  in  k,  as  stated  in 
Theorem  5.2.2 


s  »„<*)  * 


for  k£l  and  s*>s>l 


Proof  (also  by  induction) 


S,(s)-S  ,(s) 

[s*p  +  s(l~p) J  -  s 


£  0 


for  s^l. 


Suppose 


Sk„(s>  *  sk  <s> 


for  s*>s£l.  Then 


Sk„  <S>  -  W«> 


sp/[sp+(l-p)  ]  {Sk+|  ([Sp+d-p)]1 )  -  Sk  ( [sp+(l-p)  ]  )  } 


for  s*>s£l.  Hence 


S  k+,(s)  »  Sk(s) 

is  true  for  all  k  by  induction. 

Q  •  E  •  D  • 

Since  S^ts)  is  upper  bounded  and  increases  monotonically  in  k, 

Sy(s)  converges  to  an  S(s),  the  steady  state  probability 

generating  function.  Thus,  a  steady  state  probability 

distribution  P(n)  exists,  and  the  tail  of  P(n)  decreases  faster 
a* 

than  (p/(l-p))  for  large  n.  The  upper  bound  (s*-l)s/(s*-s)  is 
plotted  in  figure  5.2.5  for  p*.4  and  .45,  when  the 
•un-throughputs  are  .51  and  .60  respectively. 

The  above  scheme  also  achieves  the  sum  capacity  of  .69. 

a 

S(l+€)  is  finite  for  some  «>0  provided  ( (l-p)/p) ,>1 ,  or  p<l/2. 
Therefore,  reliable  communication  can  be  achieved  if  the 

-7 

sum-throughput  T  satisfies  p*l-e  <1/2,  or  equivalently  T<.69. 

Renouncing  the  independence  assumption  leads  to  a  Pareto 
distribution  [10]  for  P(n).  The  same  phenomenon  happens  to  the 
collision  channel,  which  is  analysed  in  chapter  7. 


Chapter  6  Multiple  Accessing  for  the  Spread  Spectrum  Channel 


This  chapter  considers  the  use  of  error  correction  codes 
on  top  of  Psuedo-Noise  (PN)  sequence  coding  for  code  division 
multiple  accessing  of  the  asynchronous  spread  spectrum  channel. 
The  channel  is  found  to  have  a  maximum  throughput  of  .72  and  .36 
based  on  the  evaluation  of  channel  capacity  and  cut-off  rate 
respectively.  More  generally,  these  two  values  are  derived  for 
given  n/N  in  which  n  is  the  length  of  the  PH  sequence  and  N  the 
number  of  simultaneous  users.  It  is  found  that  to  achieve  the 
maximum  throughput  for  a  given  bandwidth  expansion,  n  should  be 
small.  This  implies  that  coding  schemes  with  short  PH  sequences 
and  low  rate  codes  are  superior  in  terms  of  throughput  or  antijam 
capability.  The  extreme  case  of  n=l  corresponds  to  using  a  very 
low  rate  code  with  no  PN  seguence  coding.  Convolutional  codes  are 
recommended  and  analysed  for  their  error  rate  and  decoding 
complexity. 

This  chapter  is  divided  as  follows.  Faction  6.1  describes 
the  coding  scheme  and  subsequently  models  the  channel.  A  Gaussian 
approximation  is  used  in  the  process.  Section  6.2  then  evaluates 
the  total  capacity  and  cut-off  rate  for  the  active  users.  For 
convolutional  codes  and  sequential  decoding,  it  is  shown  that  the 


maximum  throughput  is  (log.e)/4 


.36.  Section  6.3  gives  an 


analysis  for  the  bit  error  probability  and  decoding  complexity  as 
a  function  of  the  constraint  length  and  rate  of  the  convolutional 
code,  as  well  as  the  length  of  the  PH  sequence. 
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6.1 


Modeling 


Let  N  be  the  number  of  active  users.  Each  user  i,  l^i^N, 
is  assigned  an  n  bit  PN  sequence  r  €  {±1  } .  A  chip  is 
defined  as  as  an  interval  of  length  d.  The  PN  sequence  carrier  is 
the  function 


n 

•i ( t )  -  Z  c;<|  s(t-jd) 


in  which  s(t)  equals  1  if  0$t$d  and  0  elsewhere.  Each  user 
encodes  the  binary  (0  or  1)  data  stream  {...u-  .  u  •  u  :  ,  ...1  into 
the  antipodal  (1  or  -1)  code  stream  {...x^_,x^  xi#,  The 
rate  of  encoding  is  r  ($1).  A  coding  scheme  using  convolutional 
code  is  shown  in  figure  1.1.3.  The  signal  sent  by  each  user  is 


x-(t) 

xi  k  cx,^t-^n^+DiC) 


in  which  P;  is  the  power  for  transmitter  i,  and  is  the  delay 
for  user  i,  which  is  randomly  distributed  in  [0,nd].  The  channel 
output  is 

N 

y ( t )  -  £  x.(t) 

The  receiver  i  first  of  all  acquires  synchronization  with 
transmitter  i  by  estimating  ,  which  we  assume  can  be  estimated 
accurately  by  some  means.  Demodulation  is  performed  by  match 
filtering  and  the  values 


N  -lh  | 

»i.k  -  <"  %.  1  /a 


•P£  +  (k+0  r>4 

y(t)  c-(t-knd+D-)  dt 

A  ^ 


,9} 


are  obtained.  The  y^fc  's  are  subsequently  quantized.  We  shall 
assume  either  hard  quantization  or  no  quantization.  The  channel, 
as  viewed  by  user  i,  can  be  characterized  by  P(y.  -,/x.  .). 

A  A. j  K 

Assuming  that  the  PN  sequences  are  generated  randomly  with 
equiprobable  use  of  1  and  -1,  it  follows  readily  that  the  random 
variable  y-  .  has  mean 

«  i/z 

E(y;,k  /x*,k  "  m)  “  m  <npc/  p^) 

and  variance 

E<<y4K  -h.k  >*/** k  -  m)  -  1 

for  m  =  1  or  -1.  Since  y^ ^  is  the  sum  of  a  large  number  of 
random  variables,  the  statistics  of  y^  is  approximately  Gaussian 
by  the  law  of  large  number.  Hence  by  defining 
H 

N..v/2  -  £  V" 

1*C 

and  dropping  the  subscript  k,  we  have  the  approximation 
P(y;/*£  ■  m)  -  exp  (~(y;  -  m./_2P; /N##£  )  /2)  /  JT* 

(Strictly  speaking,  the  y.-.'s  are  not  mutually  independent.  This 

A,  K 

dependence,  however,  is  weak  and  diminishes  as  the  number  of 
users  N  becomes  large.) 


The  channel  (actually  the  channel  plus  quantizer)  can  be 
modeled  by  the  channel  transition  diagram  in  figure  6.1.1  and 
figure  6.1.2  for  the  cases  of  no  quantization  and  hard 
quantization  respectively.  Figure  6.1.1  is  essentially  a  binary 


input  Gaussian  output  channel,  whereas  figure  6.1.2  is  a  Binary 
Symmetric  Channel  (BSC)  with  cross-over  probability 
P=Q(/2pT/n7 in  which 


Q(z) 


-z 

-I  ■ 


-x*/z  _ _ _ 

e  dx/JTn 


Consider  the  case  of  no  coding  on  top  of  the  PN  sequence  (i.e., 
x .  ,  *1  if  U;  .  =  1  and  x  ;  .  =  -1  if  u  •  ,=  0).  The  channel  is 

Xf  k  fc  *,  M  tiK 

therefore  the  BSC  of  figure  6.1.2.  Assuming  equal  power  for  all 
users,  the  bit  error  probability  upper  bounded  by 

Q<  j2P;/N#/i  )  -  Q(7n/(N -7)  )  =  Q(t',/2) 

in  which  T  is  the  total  throughput  (N/n  bits/chip).  This  upper 
bound  is  listed  in  figure  .  6.1.3.  It  is  readily  seen  that  the 
throughput  has  to  be  reduced  substantially  for  tolerable  error 

probability.  An  immediate  conclusion  is  that  coding  should  be 

used  on  top  of  PN  sequence  coding  to  achieve  a  lower  bit  error 
probability. 
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6.2  Capacity  and  cutoff  rates 


The  capacity  of  a  channel  is  the  upper  bound  on  the  rate 
of  reliable  communication.  The  cutoff  rate  is  the  value  of  the 
random  coding  exponent  function  with  p-  1  [9].  When  sequential 
decoding  is  used,  the  cutoff  rate  is  the  maximum  rate  of 
communication  over  the  channel  with  bounded  average  number  of 
computation  for  decoding  an  information  bit.  For  the  BSC  of 
figure  6.1.2,  the  cutoff  rate  and  capacity  (in  bits/code  symbol) 
are  derived  in  [10],  giving 


R  o/t  -  1  “  l°g2U  +  V*  p^l  -  p^)  ) 
C*  -  1  -  H  (pj  . 


in  which 


P;»  Q(/2P./N#/C  ) 

and  H  is  the  binary  entropy  function.  For  the  binary  input 
Gaussian  output  channel,  we  have  [io] 

R  0,1  u  1  ■  1°9i(l  +  e  ) 


C  l  »  log  27re 


in  which 


-JV 


(y)  log2P  (y)  dy 


p*(y)  *  (pJT  (y)  +  p‘  (y))  /  2 


p  m  <y )  s  -  y i/  x.  =  m)  =  exp  (- [y^-  m./2P.  /N~  ] Z/  2)  /TT* 

Me  shall  assume  henceforth  that  the  transmitting  power  are  equal 
for  all  users.  Define  p  =  n/N,  the  ratio  of  the  length  of  the  PN 
sequence  to  the  total  number  of  active  user.  Summing  over  the  N 
users,  and  normalizing  by  n,  the  number  of  chips  in  a  code  symbol 

,  the  sum-capacity  and  sum-cutoff  rate  (in  bits  per  chip)  for 

the  BSC  are 

RT(f)  »  N/n  Ri0  =  f'  (1  -  log  2(1  +  /4p(l  -  p)  )  ) 

CT(p)  -  N/n  CA-  *  fX  (1  -  H  (p)  ) 

in  which 

p  *  Q  (Jn/{N-1)  )  a*  Q  (  ft'*)  for  large  N. 

For  the  binary  input  Gaussian  channel, 

~i  -fi/* 

RT(y3)  -  p  (1  -  log2(l  +  e  )  ) 

-i 

Ct(£)  =  P  (~  loga27xe  -J  P(y)  log^Pfy)  dy  ) 
in  which 

p(y)  -  (  p  ,  (y)  +  p_,  (y)  )  /  2 

and 

Pw(y)  -  exp  (-[y  -  m|S/23  /  2  )  / /T* 

These  four  functions  of  p  are  listed  in  figure  6.2.1  and  6.2.2. 
All  four  functions  are  observed  to  be  monotonically  decreasing  in 
£.  It  can  be  readily  shown  that  all  four  converges  to  the 
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function  1/jg  for  large  values  of  ft  .  Asymptotic  evaluation  of 
these  functions  for  vanishing  values  of  p  gives 

Rt  =  2n  ^°9  e  *  *2296 
CT  =  log  e  *  .4592 

for  the  BSC  and 


Rf  *  ^  log*  e  *  .3607 
Cr  *  ^  log  e  *  .7213 


@ 

for  the  binary  input  Gaussian  channel.  It  is  noteworthy  that 
small  values  of  ft  correspond  to  very  noisy  channels  in  figures 
6.1.1  and  6.1.2,  when  the  cutoff  rate  can  be  shown  to  be  half  the 
capacity.  In  fact,  both  RT  and  CT  for  diminishingly  small  ft  can 
be  derived  alternatively  using  a  very  noisy  channel  model  [9,10]. 
The  asymptotic  result  for  large  ft  suggests  that  using  long  PN 
sequences  decreases  the  sum  of  the  capacity  and  cutoff  rate.  The 
lesson  is  that  we  should  use  very  low  rate  encoders  and  short  PN 
sequences,  so  that  ft  can  be  made  small.  The  smallest  value  of  ft 
is  achieved  for  n*l  which  corresponds  to  using  a  very  low  rate 
code  with  no  PN  sequence  coding. 


@  It  is  well-known  that  hard  quantization  reduces  the  capacity  by  a  factor  of 


7X/2  [9 #10],  as  shewn  in  this  example. 


6.3 


Error  probability  and  decoding  complexity 


This  section  studies  the  upper  bound  on  error  probability 
as  a  function  of  complexity.  For  constraint  length  k  and  rate  1/v 
convolutional  codes,  the  ensemble  average  of  the  expected  number 
of  bit  errors  for  an  error  event  starting  at  a  given  time  [10]  is 
upper  bounded  by 


Pfc  <  2  /  [1  "  2  ]z 


-kvpxrtf) 

-2  /  [1  -  2  3* 


mr 


[(1  +  e  )/2]  /[I  -  2  (<1  +  e  )/2)  ]  * 


The  last  equality  follows  from 
RT</&)  -  /Ml  -  log^l+e'^*)) 

derived  in  section  6.2  for  the  binary. input  Gaussian  channel, and- the 
total  throughput  for  the  N  users  is 


T  -  (N/n) .  (1/v)  -  l/(vj&  ) 

This  upper  bound  on  Pfc  versus  T,  the  degree  of  congestion  of  the 
channel,  is  shown  in  figure  6.3.1  for  k=10  and  v  from  2  to  10. 
The  result  shows  that  using  a  large  v  gives  far  superior  error 
probability  for  given  k.  Consequently,  using  a  small  p  also 
improves  the  antijam  capability  of  a  user.  By  setting  to  zero 
the  denominator  of  the  upper  bound  on  P^,  we  have 


SEMI-LOGARITHMIC 
KEUFFEL  A  ESSER  CO. 


This  value  of  T  versus  v  is  plotted  in  figure  6.3.2,  together 
with  the  corresponding  value  of p  (=l/(Tv)).  It  is  observed  that  a 
large  v  (consequently  a  small  yS)  increases  the  value  of  T.  A 
value  of  v  around  5  would  give  a  value  of  T  close  to  the  maximum 

Rr 

Since  =^RT(^) ,  the  error  probability  is  decreasing  at  a 
rate  that  is  exponential  in  -nvkRT(^3)/N.  It  is  noteworthy  that  nv 
is  the  number  of  chips  for  the  output  on  a  transition  of  the 
trellis.  Consider  the  use  of  Viterbi  decoding,  for  which  decoding 
complexity  is  measured  by  k  (the  trellis  therefore  has  2 
states),  for  fixed  nv  and  N.  Consider  changing  the  value  of  yfi  . 
(Since  (5*  n/N  and  nv  is  fixed,  a  smaller  £  would  imply  a  smaller 
n  and  a  larger  v.)  Due  to  the  fact  that  RT(|S)  is  maximized  by 
small  values  of  £ ,  the  complexity  k  is  minimized  for  small  values 
of  p  at  a  fixed  error  probabi  lity. 

Using  small  /S  ,  however,  may  require  a  decoder  complexity 
larger  than  that  reflected  by  the  value  of  k  due  to  two  reasons. 
First,  decoding  can  be  performed  in  units  of  code  symbols  since 
circuits  on  a  chip  are  available  for  demodulating  the  entire  PN 
sequence  carrier  (which  is  considered  as  a  code  symbol  in  the 
channel  transition  diagram).  A  small  fb  would  require  a  large  v, 
the  number  of  code  symbols  on  each  transition  of  the  trellis. 
Second,  the  use  of  small  requires  finer  quantization  for 
decoding  since  the  channel  (figure  6.1.1)  becomes  more  noisy.  The 
merging  of  the  four  functions  in  figures  6.2.1  and  6.2.2  in  fact 
shows  that  rough  quantization  may  be  used  with  little  effect  on  R 


or  CT  for  large  yS  . 


Since  user  interference  is  severe  in  such  system,  using  a 
long  constraint  length  code  is  crucial  for  low  error  probability. 
Unfortunately,  Viterbi  decoding  is  too  complex  to  implement  for 
constraint  lengths  beyond  10.  We  suggests  searching  for  good 
convolutional  codes  to  decrease  error  probability  and  increase 
total  throughput.  (Our  previous  bound  on  Pfc  is  a  random  coding 
bound  for  an  ensemble  of  codes  chosen  at  random.)  A  directory  of 
good  encoders  would  be  built  up,  from  which  each  subscriber  would 
choose  an  encoder.  The  maximum  size  of  the  directory,  which 
determines  the  number  of  subscribers  for  the  system,  depends  on 
the  density  of  good  codes  times  the  volume  of  the  code  space.  We 
believe  that  searching  for  good  convolutional  encoders [17]  is  a 
better  and  more  tractable  way  to  improve  code  orthogonality  than 
trying  to  design  PN  sequences  with  good  auto-correlation  and  low 
cross-correlation,  which  is  difficult  due  to  code  asynchronism. 

A  long  PN  sequence  may  also  make  synchronization  at  the 
receiver  easier  to  achieve.  It  should  be  noted  that  two 
synchronizations  have  to  be  performed,  namely  the  synchronization 
of  the  PN  sequence  carrier  and  the  synchronization  of  the 
convolutional  encoder,  up  the  the  code  symbols.  For  the  case  of 
n*l,  there  is  no  need  for  PN  sequence  synchronization.  In 
practice,  a  synchronization  signal  can  be  sent  on  top  of  the 
coded  signal  for  easy  code  synchronization. 

Priorities  among  the  users  can  be  set  up  by  the 
authorization  of  transmitting  power  levels  for  the  users.  Using  a 


higher  transmitting  power,  a  user  may  choose  to  increase  its  data 
rate  or  decrease  its  error  rate.  The  analysis  in  this  chapter 
can  be  carried  further  to  show  that  the  capacity  for  each  user  is 
directly  proportional  to  its  transmitting  power.  Fading,  due  to 
distance  and  poor  weather  condition,  can  also  cause  variations  of 
signal  power  and  results  in  dramatically  deteriorated  system 
performance.  The  system,  in  order  to  operate  reliably  in  most 
conditions,  would  have  to  include  a  power  margin  that  reduces  the 
total  throughput  of  the  system.  Also,  power  monitoring  for 
fairness  is  difficult  in  practice. 


It  has  been  suggested  [5]  that  CDMA  for  the  spread 
spectrum  channel  should  operate  at  a  throughput  of  about  10 
percent  for  reasonable  error  performance.  The  analysis  in  this 
paper  gives  a  maximum  throughputof  36  percent  based  on  the*  cutoff 
rate.  In  practice,  the  throughput  would  have  to  be  reduced  for  a 
smaller  decoding  complexity  required  for  a  tolerable  bit  error 
rate.  We  have  shown  that  a  throughput  of  20  percent  is  feasible 
in  practice. 


Chapter  7 


Multiple  Accessing  for  the  Collision  Channel 


For  the  collision  channel,  each  user  sends  packets,  which 
are  destroyed  if  they  overlap.  We  shall  consider  the  use  of  such 
a  channel  for  multiple  accessing,  when  each  user  redundantly 
encodes  the  information  to  be  sent.  The  resulting  data  sequence 
is  blocked  into  packets  for  transmission.  The  receiver  collects 
all  the  packets  from  the  corresponding  transmitter  and  ignores 
all  other  packets  and  collisions.  These  collected  packets  are 
subsequently  decoded.  Section  7.1  describes  the  channel  model  and 
derives  the  capacity  region.  Section  7.2  analyses  the  performance 
of  block  codes.  Section  7.3  proposes  a  convolutional  coding  and 
interleaving  scheme,  and  compute  the  sum-cutoff  rate  of  the 
channel.  Section  7.4  analyses  a  decoding  scheme  that  keep  track 
of  all  active  paths  in  the  trellis  and  evaluates  its  decoding 
complexity.  Section  7.5  generalizes  sections  7.1  and  7.3  to  the 
case  when  the  channel  is  corrupted  by  additive  Gaussian  noise.  It 
shows  that  substantial  power  saving  is  gained,  at  hardly  any 
extra  cost,  by  the  redundant  coding  that  is  originally  intended 
to  correct  erasures  due  to  packet  collisions. 


7.1 


Modeling  and  channel  capacity 


The  slotted  (or  packet  synchronized)  noiseless  collision 
channel  is  defined  as  follows.  The  code  symbol  alphabet  is  X^  = 
{0,1,2,  ...  ,2  '}  for  each  user,  in  which  0  represents  an  idle  and 
the  letters  1  to  2  each  represents  a  packet  of  length  n  bits. 
The  channel  transition  is  given  by  y=x.  if  x.=0  for  all  i^j,  and 

1  4. 

y*erasure  if  more  than  one  x.fO.  The  channel  transition  diagram 

A» 

for  user  i  is  shown  in  figure  7.1.1. 


Without  code  synchronization  nor  joint  decoding,  the 
capacity  region  R  is  given  by 

&  -  U  R 

.  P(x, . . .x„y ) 

;P(x|  . .  .xMy)=P,(x(  ) . .  •IJt(xM)P(y/xi  . .  .xH) 
in  which  (R,,...,RM)€  &  iff 


R  <  I(X-;Y)  for  all  i. 

v 

Due  to  the  symmetry  of  the  letters  1  to  2  ,  it  is  apparent  that 

■n 

each  of  the  2  nonzero  packets  should  be  equiprobable  for 
achieving  capacity.  Therefore,  we  shall  abbreviate  p.  a  by  p. , 

A/  ^  M 

and  p. . .  by  (l-p.)/2  for  all  l£ j^2  . .  It  can  be  shown  after  some 
tedious  evaluation  that 


M 

I (X^;Y)  »  n  (l-p;)/p;  TT  p.  +  0(1)  bits/slot 

r-'  1 

-  d-p;)/px  TT  p; 

.jSI  i 


n-bi t/slot 


erasure 


Transition  probabilities 
p(no  packet  sent  by  the  other  M-l  users) 

P (exactly  one  other  user  sends  the  message  yj 
P(two  or  more  packets  sent  by  the  other  M-l  users) 
P(one  or  more  packets  sent  by  the  other  M-l  users) 

Channel  transition  diagram  for  the  collision  channel 
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for  large  n.  We  would  like  to  have  a  characterization  of  4? 
without  the  p^  's.  Consider  the  equations 
M 

*  U-Pp/P l  H  P; 

r-'  * 

for  all  i.  Differentiating  the  R^'s  with  respect  to  the  p^.  's 
gives 


M  mm 


At  the  boundary  of  (R,  ,  it  is  necessary  that  the  vector 

(dR, ,dR4 , . . . ,dRM)  can  only  be  tangential  to  the  surface  of  the 

boundary  for  all  vectors  (dp, , dpt , . . . , dpM) .  Therefore,  it  is 

necessary  that  the  second  matrix  on  the  right  hand  side  be 

singular.-  Let  c^  be  the  i-th  column  vector  of  that  matrix.  Linear 

dependence  of  these  vectors  implies  that  Z,cC.c.=0  for  some  «(. '  s 

i  *  ”*•  “  a, 

which  are  not  all  zeros.  Consequently, 
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(?  *i  )  (1  "  P.)  -  C*, 

>  * 

^  )  (1  ”  p;  -  oLx 


Summing  the  elements  of  the  above  vector,  we  have 

(  ?  <*i  )  (  %  (1  -  P;  )  )  -  (  )  «  0 

*  9  /  1  1 

which  implies 

t2,+<3a+***+<3M*1  (as  f  0  from  *) 

where  q.-1-p^.  Thus,  the  sum  of  the  probabilities  of  sending  a 
packets  for  the  H  users  equals  one  at  the  boundary  of  & . 
Therefore,  the  boundary  of  fi,  satisfies  the  following  set  of 

equations, 

M 

5,  (i_pi)  -  1 

RjL  -  (l-pj/p,.  7T  P  •  ,  l$i$M 

i*‘  1 

For  M«2,  the  elimination  of  p(  and  p2  from  the  above  equations 
gives 

R,  +  Jkj,  ■  1 

which  is  shown  in  figure  1.2.1  d.  It  is  very  difficult  to  solve 
the  above  equations  in  useful  forms  for  M>2.  Instead,  we  shall 
obtain  the  sum  of  the  rates  along  the  main  diagonal  (by  having 


UvV 


•  \  *  «  jv  .v 


equal  p^  ’s).  The  sum  has  a  maximum  value  of  (1-1/M)  .  For 
large  M,  this  value  converges  to  e"1  ,  which  is  the  same  as  the 
capacity  of  the  slotted  Aloha  channel. 

The  capacity  region  for  the  case  with  joint  decoding 
(given  in  chapter  2)  is  the  same  as  that  without  joint  decoding 
if  the  length  of  packet  n  is  large  enough.  This  is  due  to  the 
fact  that  collisions  result  in  erasures  that  are  totally  useless 
for  decoding,  as  well  as  the  fact  that  the  packets  for  the  other 
messages  may  be  ignored  without  much  loss  in  the  decoding  of 
one's  message. 


The  remainder  of  this  section  examines  how  unslotted 
packets  affect  the  capacity  of  the  channel.  Let  1-p.  be  the 

A* 

probability  that  user  i  transmits  a  packet  in  the  slot.  The 

probability  (conditioned  on  a  transmission  in  a  slot)  that  a 

packet  for  user  i  does  not  collide  with  the  packets  of  user  j  is 
a 

p. . ,  because  the  packet  for  user  i  may  collide  with  a  packet  in 
the  two  slots  that  overlap  with  the  packet  of  user  i.  Hence  the 
throughput  for  user  i  is  given  by 


(1 


w  !  "  , 
-pj/p;  npi 


Assume  equal  rates  for  all  users  so  that  p.  =p.=p;  the  sum  of  the 

*  1 

rates  is  then 
M 

T  -  2  T.  -  M  (1-p)  p 
**■1  *■ 


-l) 


which  is  maximized  when  p=l-l/( 2M-1 ) ,  with 


T  -  M/(2M-1)  (1-1/(2M-1)) 


For  large  M,  T  approaches  e  '  /2,  which  is  the  same  as  the 
capacity  for  the  pure  Aloha  scheme. 

In  practice,  we  may  be  able  to  recover  the  front  part  of  a 
packet  up  to  the  point  when  it  starts  to  collide  with  another 
packet,  as  shown  in  figure  7.1.2.  No  portion  of  a  packet  which 
starts  at  a  time  when  another  packet  is  being  transmitted  may  be 
recovered,  since  the  preamble  (for  identification  and  receiver 
synchronization),  which  is  placed  at  the  beginning  of  the  packet, 
is  lost  in  the  collision.  We  assume  the  length  of  the  preamble  to 
be  small  compared  with  the  length  of  the  packet.  Appendix  7.1 
shows  that  the  maximum  sum  of  the  throughput  equals  1/4. 

Massey  [6]  has  shown  that  the  capacity  of  the  .  unslotted 

channel  is  the  same  as  that  of  the  slotted  channel.  Consider  the 

grouping  of  u  packets  into  a  super-packet  as  shown  in  figure 

7.1.3.  The  key  idea  is  that  even  though  a  packet  is  totally  lost 

through  partial  overlapping  with  other  packets,  part  of  a 

super-packet  can  be  retrieved  when  it  overlaps  partially  with 

other  super-packets,  as  shown  in  figure  7.1.3.  The  proof  of  the 
-i 

e  sum-throughput  is  shown  in  Appendix  7.2. 


recovered  portion  of  a  packet 


portion  of  a  collided  packet  that  cannot  be  recovered 


an  empty  slot 


Figure  7.1.2  Partially  recoverable  packets 
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7.2  Block  coding  scheme 


This  section  examines  the  use  of  block  codes  for  the 
slotted  collision  channel.  At  the  end  of  the  section,  we  show  how 
super-packet ing  may  be  used  for  the  unslotted  channel. 


Reed-Solomon  codes  are  especially  effective  for  the 
correction  of  erasures.  Each  packet,  which  consists  of  n  bits, 

71 

may  be  treated  as  an  element  of  the  Galois  field  GF(2  ).  A 
Reed-Solomon  code,  defined  on  GF(2*),  has  codewords  f »( f f.) 
satisfying  the  equations  (for  some  integers  m,  d,  and  distinct 
z, ,z, , . . .  .z^GFU*’) . ) 
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Consequently,  the  Reed-Solomon  code  has  code  rate  (n-d+l)/n. 
Define  the  distance  between  two  codewords  f*(f,  ,...,fk  )  and 
f’«(f,’  , . . . ,  f  (t'  )  as  the  total  number  of  places  where  f^  f  f  ±  . 
We  shall  give  a  proof  for  the  well-known  fact  that  this 
Reed-Solomon  code  has  distance  at  least  d. 


Proof 

Suppose  this  is  not  true.  Then  there  exist  two  codewords  f 


'*  *  * • ■ * 


and  f'  such  that  f''»  f  +  ( —  f * )  has  less  than  d  nonzero  entries. 
Since  Zf'*0,  it  follows  that  there  exists  a  linear  combination 
of  less  than  d  of  the  column  vectors  of  Z  which  equals  the  zero 
vector.  This  is  impossible  because  Z  is  of  full  rank,  namely  d-1, 
which  follows  from  the  fact  that  Z  is  a  Vandermonde  matrix  with 


izi  -  t  TT  *•  3  C  TT  (*:-*!-)]  t  o 

i-l  i>j  1 


since  the  z^  's  are  distinct. 


Q •  £ .  D. 


Massey  [6]  used  the  Reed-Solomon  code  to  prove  that  zero 
error  probability  can  be  achieved  with  each  user  transmitting  at 
an  information  rate  of  (1-1/M)  /M  n-bit  per  slot.  The  scheme 

M  **) 

involves  a  clever  way  for  each  user  to  put  M  packets  in  a 

time  frame  of  M  slots,  such  that  exactly  (M-l)  packets  would 
be  collision  free  for  each  user,  no  matter  how  the  frames  of  the 
users  are  relatively  shifted  (slot-wise).  Therefore,  a 
Reed-Solomon  code  of  rate  (M-l)  «...  /M  .  would  suffice  to  correct 
all  the  erasures.  The  throughput  for  each  user  is  then 

M-l  M  M-l  M-l 

(M-l)  .  /M.  .  ® (1-1/M)  ,/M.  The  sum-throughput  is  (1-1/M)  , 

which  is  the  same  as  the  capacity  derived  in  section  7.1. 


The  scheme  of  Massey  is  inadequate  for  implementation  if 
the  number  of  users  (M)  is  large  or  variable.  However, 
Reed-Solomon  codes  are  still  effective  for  such  a  channel.  Assume 
that  a  user  puts  a  packet  in  a  slot  with  probability  q=l-p  and 
the  k  consecutive  packets  he  puts  on  the  channel  comprise  a 
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codeword  for  a  rate  r*h/k  Reed-Solomon  code.  Consequently,  the 
information  rate  for  the  user  is  q.h/k  n-bit  per  slot.  The 
sum-throughput  is  T*Mqr  n-bit  per  slot.  The  probability  of 
erasure  for  a  packet  is 

MH  M-)  -t /  r 
£  -  1  -  (1-q)  ■  1  -  (1  -  T/Mr)  »  1  -  e 


for  large  M.  The  probability  of  block  error  is  upper  bounded  by 


k 


*c; 


k-i 

<1-0 


Using  an  upper  bound  [9]  on  the  sum  of  the  tail  of  a  binormial 
distribution  ,  we  have 


i  It- A 

h  (l-*)/(h{l-e)  -  (k-h)«  )  kCk  eA(l-e) 

kHeir)  h  |c -A 

r  (l~€)/(r (1-4)  -  (1-r)  4  )  e  fc  (1-e) 


-rA-  -t Jr  - T/r 

r  (1-e  )/(re  -(l-r)(l-e  )  ) 


-r  -U-r)  -T 
[r  (1-r)  (1-e  * 


r  <»-»•;  k 
e  ) 


This  upper  bound  P^(T,r,k)  is  plotted  as  a  function  of  T  for 
several  codes  in  figure  7.2.1.  The  error  probability  is  not  very 
satisfactory  for  these  codes.  We  shall  show  that  convolutional 
codes  have  a  more  superior  error  probability. 


If  the  channel  is  unslotted,  groups  of  u  Reed-Solomon  code 
symbols  may  be  grouped  together  to  form  a  superpacket.  For  large 
u,  the  probability  of  erasure  for  a  particular  packet  is  the  same 
as  that  of  the  unslotted  channel.  Thus  the  throughput  of  the 


7.3 


Convolutional  coding  scheme 


We  now  turn  to  the  use  of  convolutional  codes.  The 
encoding  scheme  is  shown  in  figure  7.3.1.  The  outputs  of  the  rate 
r=l/v  constraint  length  k  code  (v=2  in  the  figure)  is  fetched 
alternately  by  the  switch  S,  .  Switch  puts  successive  bits 
into  successive  packets  in  a  stack  of  m  packets.  When  all  m 
packets  are  filled,  each  packet  would  be  transmitted  after  a 
random  delay. 

For  the  unslotted  channel,  the  data  sequence  is  fed  in 
parallel  (figure  7.3.2)  into  u  encoder-interleavers  (the  square 
box  in  figure  7.3.1),  and  the  u  packets  from  the 
encoder-interleavers  are  grouped  together  to  form  a  super-packet. 
The  super-packet  is  transmitted  after  a  random  delay. 

Decoding  delay  results  from  the  fact  that  the  data  bits 
have  to  be  deinterleaved  before  decoding.  This  delay  is  of  the 
order  mn.  To  minimize  delay,  we  may  use  a  smaller  stack  or 
shorten  the  length  of  a  packet.  Shortening  the  length  of  a 
packet,  however,  would  incur  a  higher  fraction  of  preamble 
overhead  in  the  message  body.  The  number  of  packets  in  the  stack 
should  be  at  least  k.v,  so  that  each  bit  in  a  packet  is  generated 
from  a  different  set  of  information  bits.  If  m<k.v,  error 
probability  would  increase  because  erasures  are  clustered 
together.  Choosing  m  slightly  larger  than  k.v  would  suffice  since 
the  likelihood  of  long  error  paths  in  the  trellis  is  small. 


V  vv* 
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Decoding  delay  due  to  deinterleaving  can  be  reduced  by  the 
use  of  convolutional  interleaving  as  shown  in  figure  7.3.3.  Each 
packet  is  filled  with  n/m  more  bits  than  the  next  packet  in  the 
stack.  The  packet  at  the  top  of  the  stack  is  popped  off  the  stack 
when  the  packet  is  filled.  An  empty  packet  is  added  to  the  bottom 
of  the  stack  when  the  packet  at  the  top  is  popped.  The  popped 
packet  is  transmitted  after  a  random  delay,  which  is  smaller  than 
the  packet  interarrival  time.  Using  this  scheme,  delay  incurred 
and  buffer  required  are  about  half  that  of  the  previous 
interleaving  scheme.  The  convolutional  interleaving  scheme, 
however,  has  a  problem  with  the  first  few  packets  of  a  message, 
since  the  packet  at  the  top  of  the  stack  is  empty  at  the 
beginning.  It  seems  that  the  first  few  packets  would  have  to  be 
filled  in  the  manner  shown  in  figure  7.3.4,  subsequently  wasting 
m/2  packets  per  message.  This  waste  is  tolerable  only  if  the 
message  is  long.  The  choice  between  these  two  schemes  depends  on 
the  length  of  the  message  and  the  amount  of  decoding  delay  that 
can  be  tolerated. 


We  now  give  a  random  coding  bound  [10],  based  on  the 
cutoff  rate,  for  the  error  probability.  Consider  the 
deinterleaved  data  sequence  at  the  receiver.  The  occurence  of 
erasures  in  the  deinterleaved  sequence  is  not  memoryless  in  the 
sense  that  erasures,  if  they  occur,  are  periodic.  If  we  use  a 
decoding  metric  that  is  time-  invariant  and  a  convolutional 
encoder  of  long  constraint  length,  these  erasures  may  be  viewed 
as  being  memoryless.  Hence,  we  may  treat  the  channel  as  a  binary 
erasure  channel  shown  in  figure  7.3.5  if  we  impose  the 


portion 


Figure  7.3.4  Length  of  the  first  few  packets  of  a  message 
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time-invariant  structure  on  the  decoder.  The  capacity  of  the 
binary  erasure  channel  is  equal  to  C»l-€.  Consequently,  the 
sum-capacity  is 

CT  ■  max  M  q  (l-€r)  »  max  M  q  (1-q)  *  (1-1/M) 

t  i 

which  approaches  e  '  for  large  M.  The  cutoff  rate  for  the  binary 
erasure  channel  is 

R. 

-  -log  ,(2  (f  P(x)  yj  P(y/jT)  )*  ) 

-  -log  a(  (1  +  6)/2  ) 

-  -log  x  (1  -  e  Tr/2) 

-Tir 

since  &«l-e  for  large  M.  The  sum  of  the  cutoff  rates  for  the 
M  users  is 

-TV 

RT»MqR#  -  -  Tv  log^l  -  e  /2) 

since  Mq/v«T.  Reliable  communication  using  sequential  decoding  is 
achievable  for  T  up  to  RT.  Consequently  the  maximum 
sum-throughput  satisfies 

-tv 

T  »  -  Tv  logz(l  -  e  /2) 
or 

-  v 

T  «  — 1/v  In  (2  -  2  X  2  ) 


T  has  a  maximum  value  of  0.295096  when  v=3.014.  For  practical 
purposes,  v  is  an  integer  so  the  best  code  rate  is  1/3,  with 
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T-0. 295093.  For  v«2,  the  maximum  T  is  0.2674. 

Errors  in  decoding  can  arise  when  an  incorrect  path  with  a 
higher  metric  merges  into  the  correct  path.  The  ensemble  average 
of  the  expected  number  of  bit  errors  for  such  an  error  event 
starting  at  a  given  time  is  upper  bounded  [10]  by 

~vR,k  -drkt-i)  „ 

P  <  2  /  [1-2  ]a 

-7  v  v  , 

-  (l-e/2  )  /[i-2  (1-e  /2)  ]  2 


This  bound  is  plotted  in  figure  7.3.6  for  several  code  rates  and 
constraint  length.  The  plot  shows  that  long  constraint  length 
should  be  used  to  combat  the  severe  user  interference.  Such  long 
constraint  lengths  make  Viterbi  decoding  impractical.  Thus  the 
decoder  should  keep  track  of  only  a  subset  of  states,  the  size  of 
which  is  independent  of  the  constraint  length.  The  error 
probability  for  such  decoders  is  determined  by  the  size  of  the 
buffer  of  the  decoder.  The  next  section  investigates  the 
probability  of  buffer  overflow  versus  buffer  size  of  the  decoder 
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7.4  Decoding  complexity 


This  section  examines  a  decoding  scheme  for  the  binary 
erasure  channel  and  demonstrates  its  small  decoding  complexity  at 
a  sum-throughput  that  is  close  to  the  sum-cutoff  rate  of 
-1/v  .ln(2-2X2  c ) . 

The  decoder  keeps  a  list  of  the  active  states  of  the 
trellis.  A  state  is  active  if  there  is  a  path  leading  up  to  the 
state  such  that  the  code  sequence  of  the  path  is  agreeable  with 
the  deinterleaved  channel  sequence.  The  code  symbol  zero  (or 
one)  is  disagreeable  with  the  channel  symbol  one  (or  zero)  while 
the  channel  symbol  erasure  is  agreeable  with  the  code  symbols 
zero  or  one.  The  complexity  of  decoding  is  measured  by  the  random 
variable  n,  the  number  of  active  states  in  steady  state,  with 
probability  distribution  P(n).  The  number  of  active  states  is 
upper  bounded  by  treating  the  trellis  as  a  tree,  with  the  merged 
states  double  counted.  Since  long  constraint  length  codes  are 
used,  the  trellis  in  fact  resembles  a  tree  for  short  paths  off 
the  correct  path.  Long  incorrect  paths  are  highly  unlikely,  thus 
decoding  error  due  to  the  merging  of  an  incorrect  path  with  the 
correct  path  may  be  ignored. 

Let  n.  be  the  number  of  active  states  at  time  i  of  the 

trellis.  Let  e .  .  be  the  number  of  states  at  time  i+1  that  are 

made  active  from  the  active  state  t  at  time  j.  We  shall  drop  the 

superfluous  subscript  j  for  e,  . . .  Without  loss  of  generality,  the 

'  J 

correct  path  is  assumed  to  traverse  the  state  sequence  of  all 


1  j’OSlKo1 


lvl\ . 


ones,  and  the  plausible  states  are  labeled  from  2  to  n..  Thus 
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n  •  *  21  e* 

with  probability  generating  function 
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in  which  the  probability  generating  function 
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where 


rt(s)  *  Z  P(e,.  «i/r  erasures)  s 
**•  ‘ 


is  the  probability  generating  function  of  e^  conditioned  on  the 
number  of  erasures.  The  product  .  7T.-V*..*  (s)  results  from  the  fact 

t*i  *  *- 

that  the  random  variables  e^  given  r  erasures  are  independent  for 
various  t’s  and  that  the  probability  generating  function  of  a  sum 
of  independent  random  variables  is  equal  to  the  product  of  the 


generating  function  of  each  random  variable,  which  is  given  by 
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and 
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■  P(e  «0/r  erasures)  +  P(e.  *l/r  erasures)  s  + 
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for  2^t$m.  Consequently, 
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The  steady  state  probability  generating  function  S(s)  satisfies 
the  above  equation  after  dropping  the  subscripts  j  and  j+1. 
Differentiating  this  equation  with  respect  to  s,  and  putting  s«l 
(with  S'(l)-n  and  S(l)«l),  we  obtain 


n  -  {  1  -  [d+€)/2]  }  /  {1  -  2[{l+e)/2]  } 

\r  —  tit  i r 

{  1  -  [1  -  e  /2 ]  }  /  {  1  -  2  [1  -  e  /2]  } 


Hence  n  is  finite  provided 


T  <  -  1/v  In  (2-2X2  ) 


The  upper  bound  equals  the  maximum  sum-cutoff  rate  derived  in 
section  7.3.  The  complexity  n  versus  the  total  throughput  of  the 
users  is  plotted  in  figure  7.4.1. 

Unfortunately,  S(s)  is  unbounded  for  s>l,  which  will  be 
proved  in  the  remainder  of  this  section. 


Theorem  7.4.1 


If  S(s)  is  finite  in  the  interval  (1,1+e)  for  some  fc>0,  then  S(s) 
is  finite  for  all  s>l. 


Proof 


Rearranging  the  terms  in  the  previously  derived  equation 
for  S(s) ,  we  have 

1  r  v-r  -IV-r)  -LV-r) 

S ( s  )  -  S(s)  -  Z  „Cr  k  (1-fc)  s/(  1-2  +2  s  )  X 

S(  (1-2  +2  s  )  ) 

For  all  s>l,  the  argument  s  pf  the  function  S  on  the  left  hand 
side  can- be  readily  shown  to  be  larger  than  all  arguments  of  the 
function  S  on  the  right  hand  side.  The  largest  argument  on  the 

3-  \ 

right  hand  side  is  (1/2+1/2  s)  Assume  that  S(s)  is  finite  up  to 
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s=s ' .  Letting  s 1 > ( 1/2+1/2 , s) ^  (or  s<2/s'-l,  thus  s^<4s 1 -4/s '+1) 
implies  that  all  terms  on  the  right  hand  side  are  finite.  Thus 
all  values  of  S(s)  up  to  s=4s'-4/s'+l  are  finite,  since  the  left 
hand  side  is  a  sum  of  v+1  finite  terms.  Repeating  the  above 
argument,  we  have  S(s)  finite  for  all  finite  s,  provided  that 
S(s)  is  finite  in  the  interval  (1,1+S)  for  some  G>0. 

Q.E.D. 


Theorem  7.4.1 


S(s)  does  not  exist  for  s>l. 


Proof 

Modify  the  branching  process  considered  in  this  chapter  by  having 

a  genie  that  always  reveals  the  correct  state  ( us  n_.  =1’  except 

when  all  v  channel  symbols  are  erased  1  thus  n ,  ~2r\.  ■,)  ,  which 

J 

occurs  with  probability  eV. 


For  the  process  with  a  genie  the  steady  state  probability  of  the 
number  of  active  states  is 


v  v  lo92n 
P  (n)  =  ( l-ev)  (ev)  l 

.  v 

v  log2  e 
=  (l-ev)  n 


n  =  2  ,  k  integer 


Its  generating  function  does  not  exist  for  s>l.  Obviously,  the 
number  of  active  states  without  genie  is  greater,  and  it  will  not 
have  a  generating  function  for  s>l  either. 
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7.5  The  collision  channel  with  additive  Gaussian  noise 

This  section  shows  that  the  coding  that  is  originally 
intended  for  correcting  erasures  is  effective  for  combatting 
additive  Gaussian  channel  noise.  Therefore,  signaling  power  can 
be  reduced  compared  with  the  case  of  no  coding  at  hardly  any 
extra  cost. 

We  shall  assume  that  antipodal  signaling  is  used.  The 
results  obtained  in  this  section  can  be  extended  easily  to  other 
modulation  techniques.  The  channel  can  be  modeled  as  the 
memoryless  binary  input,  Gaussian  or  erasure  output  channel  shown 
in  figure  7.5.1a.  The  capacity  of  this  channel  is 

C  «  (  1  -fe)  of  the  capacity  of  the  channel  in  figure  7.5.1b 

»  (  1  -  6  )  {  -1/2  lo^2  K  e)  -  J^P(y)lo^P(y)  dy  } 

where 

p(y)  -  (P,  (y)  +  P-,  (y)  )/2 

and 

Px(y)  -  exp  [  -(y  -  t  (2E,/N#  )V/2  ]  //T* 

The  cuto  ff  rate  can  be  shown  to  be 

R,  -  -  loga[e+  (1-  €•)  (1+e  ,  )/2  ]  7.5.1 

Since  sequential  decoders  are  used  for  decoding,  we  concentrate 
on  the  study  of  the  cut-off  rate.  Figure  7.5.2  gives  a  plot  of  R# 


vv 


as  a  function  of  Es  /N,  for  several  values  of  .  The  value  of  R 
is  seen  to  saturate  at 


R  -  -  log  (  (1+  *)/2  ) 

•  2 

for  large  values  of  Et  /Ne  .  This  occurs  when  the  occurrence  of 
erasures  dominates  the  effect  of  the  Gaussian  noise  of  the 
channel. 


Suppose  we  allow  Gaussian  noise  to  decrease  Ro  from  R^  by 
a  fraction  f,  such  that 


R.  -  f  -  -  f  log2(l  -  e"T,/2) 


7.5.2 


Consequently,  the  maximum  sum- throughput  achievable  by  sequential 
decoding  is 


T  •  -  1/v  In  (2  -  2  X  2  ) 


7.5.3 


This  throughput  as  a  function  of  f  is  shown  in  figure  7.5.3  for 
v«2,3. 

What  remains  to  be  found  is  the  value  of  the  minimum  Es  /N 
that  is  required  in  order  to  achieve  the  maximum  value  of  T  for 
fixed  v  and  f.  Eliminating  R,  and  6  in  (7.5.1)  by  the 
substitutions 


#  -  -f  log2(l  -  e  /2) 


fc  ■  1  -  e 


with  T  given  in  (7.5.3),  we  have  after  some  simplification 


es/N6  -  -  In  [  (2  -  2  )  /  (1  -  2 


which  is  shown  in  figure  7.5.4  for  values  of  v  and  f.  The  figure 
shows  that  4  or  5  dB  of  signal  to  noise  ratio  is  sufficient  for 
operation.  This  compares  favorably  with  the  typical  10.5  dB 
required  for  uncoded  antipodal  signaling  for  a  bit  error  rate  of 


Appendix  7.1 

This  appendix  derives  the  sum-throughput  of  the  unslotted 

collision  channel  for  which  we  may  recover  the  front  part  of  a 

packet  up  to  the  point  where  it  starts  to  collide  with  another 

packet.  Let  1-p  be  the  probability  that  a  user  puts  a  packet  in 

a. slot.  The  probability  that  a  packet  is  collision  free  is 
21M-0 

p  *.  The  probability  that  the  packet  is  totally  lest  is  1-p.  . 
Let  D.  be  the  difference  in  the  starting  time  between  user  i  and 
user  j,  as  shown  in  figure  7.1.2.  The  probability  that  at  least 
a  fraction  f  of  a  packet  of  user  i  is  recovered  is 

M 

P,  *  TT  [  P(D.^f)  P(no  packet  for  user  j  in  the  two  slots 

I**’  that  overlaps  with  the  packet  of  user  i) 


+  P(D.  >f)  P(no  packet  for.  user  j  in  the  first  slot 
of  the  two  slots  that  overlap  with  the 


packet  of  user  i) 


[fp  +  (1-f)  p] 


Hence 


M-f  M-2 

P(f)  «  -d  P,/df  -  (M-l)  (1-p)  p  (l-f(l-p)) 


Therefore,  the  average  fraction  of  a  packet  that  is  recovered  is 


2CH-0 


P(f)  f  df  +  l.p 


(l-pM)  pM"‘  /[  (M-l )  (1-p)  ] 


The  sum  of  the  throughput  for  the  M  users  is 
T  -  M  (l-pM)  PM”’/  (M-l) 
which  has  a  maximum  value  of 

fM-O/M 

M/(M-1)  M/(2M-1)  [  <M-1)/(2M-1)  ] 

which  occurs  at 
p  -  [  (M-1)/(2M-1)  ] 

For  large  M,  this  throughput  approaches  1/4. 


Appendix  7.2 


This  appendix  derives  the  throughput  of  the  unslotted 
collision  channel  with  super-packeting.  Let  qsl~p  be  the 
probability  that  each  user  puts  a  super-packet  in  a  super-slot. 
For  user  i 

P(a  packet  is  collision  free  with  user  j) 

■  .  P(a  super-slot  for  user  j  does  not  start  within  the 

duration  of  the  packet). 

.  P(no  super-packet  in  the  superslot  of  user  j  that 
overlaps  with  the  packet) 

+  P(a  super-slot  for  user  j  starts  within  the  duration 
r.  ‘  of  the  packet) 

.  -  P(no  super-packet  in  either  super-slots  of  user  j  that 
overlap  with  the  packet) 

■  (u-l)/u  p  +  1/u  p  1 

Therefore,  a  packet  is  successfully  transmitted  with  probability 
{  (u-l)/u  p  +  1/u  p*  7  M  ' 
and  the  sum  throughput  is 
T  »  M  (1-p)  {  (u-l)/u  p  +  1/u  pX  ]M  * 

Letting  q>f/M  and  assuming  large  value  of  M  give  the  value  T  of 


.  £  {  (u-l)/u  (1  -  f/M)  +  1/u  (1  -  f/M)  } 

■  f  {  1  -  1/M  (  (u-l)/u  f  +  2/u  f  ) 

-  f  {  1  -  1/M  (  (u+D/u  f  )  1*"* 

—  f  (u+O/v* 

■  £  e 

which  has  a  mximum  value  of  e  * u/(u+l)  when  f*u/(u+l).  Thus  the 
capacity  approaches  e  for  large  u,  which  is  the  same  as  the 


capacity  for  the  slotted  channel. 
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